<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: On Write Caching Controllers and Distributed Database Failure Models</title>
	<atom:link href="http://feedblog.org/2008/12/17/on-write-caching-controllers-and-distributed-database-failure-models/feed/" rel="self" type="application/rss+xml" />
	<link>http://feedblog.org/2008/12/17/on-write-caching-controllers-and-distributed-database-failure-models/</link>
	<description>Code, sleep, have fun</description>
	<lastBuildDate>Wed, 20 Apr 2011 06:17:54 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.1</generator>
	<item>
		<title>By: Mark Leith</title>
		<link>http://feedblog.org/2008/12/17/on-write-caching-controllers-and-distributed-database-failure-models/#comment-2782</link>
		<dc:creator>Mark Leith</dc:creator>
		<pubDate>Thu, 18 Dec 2008 11:29:04 +0000</pubDate>
		<guid isPermaLink="false">http://feedblog.org/?p=1802#comment-2782</guid>
		<description>Hey Kevin,

The new semi-sync stuff is now in 6.0.8 onwards. There was a feature preview on launchpad:

https://code.launchpad.net/%7Ehezx/mysql-server/semi-sync-replication

It&#039;s all fully documented now:

http://dev.mysql.com/doc/refman/6.0/en/replication-semisync.html

It&#039;s all plugin enabled now as well. :)</description>
		<content:encoded><![CDATA[<p>Hey Kevin,</p>
<p>The new semi-sync stuff is now in 6.0.8 onwards. There was a feature preview on launchpad:</p>
<p><a href="https://code.launchpad.net/%7Ehezx/mysql-server/semi-sync-replication" rel="nofollow">https://code.launchpad.net/%7Ehezx/mysql-server/semi-sync-replication</a></p>
<p>It&#8217;s all fully documented now:</p>
<p><a href="http://dev.mysql.com/doc/refman/6.0/en/replication-semisync.html" rel="nofollow">http://dev.mysql.com/doc/refman/6.0/en/replication-semisync.html</a></p>
<p>It&#8217;s all plugin enabled now as well. <img src='http://feedblog.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Callaghan</title>
		<link>http://feedblog.org/2008/12/17/on-write-caching-controllers-and-distributed-database-failure-models/#comment-2781</link>
		<dc:creator>Mark Callaghan</dc:creator>
		<pubDate>Thu, 18 Dec 2008 11:06:18 +0000</pubDate>
		<guid isPermaLink="false">http://feedblog.org/?p=1802#comment-2781</guid>
		<description>Kevin,
The data is committed on the master. The protocol is:
1) commit on master (innodb and binlog)
2) wait for slave to ACK
3) return to client

During 2) the data is committed on the master so other connections can run and see the results of the commit while the other client waits for commits.

If you want better guarantees, then you need to use sync commit and that doesn&#039;t exist within MySQL today.

The referenced paper is about sync commit protocols that don&#039;t block when some servers fail. This is great functionality that should eventually be supported by MySQL. You don&#039;t want to get paged every time the master hardware fails and a new master must be elected, do you?</description>
		<content:encoded><![CDATA[<p>Kevin,<br />
The data is committed on the master. The protocol is:<br />
1) commit on master (innodb and binlog)<br />
2) wait for slave to ACK<br />
3) return to client</p>
<p>During 2) the data is committed on the master so other connections can run and see the results of the commit while the other client waits for commits.</p>
<p>If you want better guarantees, then you need to use sync commit and that doesn&#8217;t exist within MySQL today.</p>
<p>The referenced paper is about sync commit protocols that don&#8217;t block when some servers fail. This is great functionality that should eventually be supported by MySQL. You don&#8217;t want to get paged every time the master hardware fails and a new master must be elected, do you?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: burtonator</title>
		<link>http://feedblog.org/2008/12/17/on-write-caching-controllers-and-distributed-database-failure-models/#comment-2779</link>
		<dc:creator>burtonator</dc:creator>
		<pubDate>Thu, 18 Dec 2008 05:56:25 +0000</pubDate>
		<guid isPermaLink="false">http://feedblog.org/?p=1802#comment-2779</guid>
		<description>&quot;BBU will be faster with standard 1Gbit ethernet, Usually you can do 10000-20000 fsync from single thread to the NVRAM with ethernet it is lower.&quot;

For InnoDB I think you&#039;re right.... I&#039;m trying to figure out by how much.

&quot;Speaking about price I think Google uses BBU for MySQL Boxes but not for Search boxes which is where crap hardware is used.&quot;

Yes... this is my assessment as well.  This is why I think they&#039;re using distributed WAL replication and multiple datacenters so that they can buy ultra
cheap hardware.

&quot;If you’re looking at the crappy box the RAID is often expensive, in particular because vendors charge large premium on it. If you have better box the added cost is much lower. For example getting $5000 Dell (with 8 HDD etc) the cost factor of RAID controller in price will be $200 or so, which is closer to 4% than 20%&quot;

I guess it depends on the card.  The MegaRAID cards we&#039;re looking at are $500 or so so that&#039;s 10% of the box.

In the actual machines we&#039;re thinking of getting they&#039;re 15%.  We&#039;re not buying cheap boxes but they&#039;re commodity which is why they are so cheap.  GREAT boxes though... I know two startups running cluster nodes of &gt; 500 boxes on their hardware.  Basically Supermicro with Seagate HDDs.</description>
		<content:encoded><![CDATA[<p>&#8220;BBU will be faster with standard 1Gbit ethernet, Usually you can do 10000-20000 fsync from single thread to the NVRAM with ethernet it is lower.&#8221;</p>
<p>For InnoDB I think you&#8217;re right&#8230;. I&#8217;m trying to figure out by how much.</p>
<p>&#8220;Speaking about price I think Google uses BBU for MySQL Boxes but not for Search boxes which is where crap hardware is used.&#8221;</p>
<p>Yes&#8230; this is my assessment as well.  This is why I think they&#8217;re using distributed WAL replication and multiple datacenters so that they can buy ultra<br />
cheap hardware.</p>
<p>&#8220;If you’re looking at the crappy box the RAID is often expensive, in particular because vendors charge large premium on it. If you have better box the added cost is much lower. For example getting $5000 Dell (with 8 HDD etc) the cost factor of RAID controller in price will be $200 or so, which is closer to 4% than 20%&#8221;</p>
<p>I guess it depends on the card.  The MegaRAID cards we&#8217;re looking at are $500 or so so that&#8217;s 10% of the box.</p>
<p>In the actual machines we&#8217;re thinking of getting they&#8217;re 15%.  We&#8217;re not buying cheap boxes but they&#8217;re commodity which is why they are so cheap.  GREAT boxes though&#8230; I know two startups running cluster nodes of &gt; 500 boxes on their hardware.  Basically Supermicro with Seagate HDDs.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: burtonator</title>
		<link>http://feedblog.org/2008/12/17/on-write-caching-controllers-and-distributed-database-failure-models/#comment-2778</link>
		<dc:creator>burtonator</dc:creator>
		<pubDate>Thu, 18 Dec 2008 05:46:08 +0000</pubDate>
		<guid isPermaLink="false">http://feedblog.org/?p=1802#comment-2778</guid>
		<description>Hey Mark.

I didn&#039;t see that the MySQL replication team had a refactored patch.  Is that anywhere public?

I&#039;m aware that the writer waits for the entire replication chain before being returned.

In our situation we do bulk inserts with INSERT ... ON DUPLICATE KEY UPDATE of 50-100 records.  So our throughput will pretty much be the same.  Plus I pretty much get group commit.  

I get to cheat on this stuff when writing crawlers :)

I didn&#039;t realize that reads from the master can see the uncommited data.  Makes sense though..... Interesting.</description>
		<content:encoded><![CDATA[<p>Hey Mark.</p>
<p>I didn&#8217;t see that the MySQL replication team had a refactored patch.  Is that anywhere public?</p>
<p>I&#8217;m aware that the writer waits for the entire replication chain before being returned.</p>
<p>In our situation we do bulk inserts with INSERT &#8230; ON DUPLICATE KEY UPDATE of 50-100 records.  So our throughput will pretty much be the same.  Plus I pretty much get group commit.  </p>
<p>I get to cheat on this stuff when writing crawlers <img src='http://feedblog.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>I didn&#8217;t realize that reads from the master can see the uncommited data.  Makes sense though&#8230;.. Interesting.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mark Callaghan</title>
		<link>http://feedblog.org/2008/12/17/on-write-caching-controllers-and-distributed-database-failure-models/#comment-2777</link>
		<dc:creator>Mark Callaghan</dc:creator>
		<pubDate>Thu, 18 Dec 2008 04:36:41 +0000</pubDate>
		<guid isPermaLink="false">http://feedblog.org/?p=1802#comment-2777</guid>
		<description>Kevin,

The semi sync patch, while very nice, and even nicer in the refactored version done by the MySQL replication team, might not do all that you expect.

It defers return to the user from commit until at least one slave ACKs the transaction&#039;s binlog events. That is, commit occurs on the master (commit to InnoDB, write events to the binlog) and then the user&#039;s connection waits while threads copy the binlog events to slaves and wait for an ACK to wake the user&#039;s connection. While the user&#039;s connection is waiting other client connections may observe the committed data.

This still has useful properties. For example, it rate limits a busy client to prevent it from making the master get far ahead of the slaves. N busy clients can make the master get N transactions ahead of the slaves. But they can&#039;t make the master get N+1 transactions ahead of the slaves -- a client can create at most 1 transaction on the master that has not been ACKd by a slave.

This is a step in the right direction but it isn&#039;t sync replication. Fortunately, MySQL hired 2 experts on sync replication into the MySQL replication team so we may get some interesting features in the future.</description>
		<content:encoded><![CDATA[<p>Kevin,</p>
<p>The semi sync patch, while very nice, and even nicer in the refactored version done by the MySQL replication team, might not do all that you expect.</p>
<p>It defers return to the user from commit until at least one slave ACKs the transaction&#8217;s binlog events. That is, commit occurs on the master (commit to InnoDB, write events to the binlog) and then the user&#8217;s connection waits while threads copy the binlog events to slaves and wait for an ACK to wake the user&#8217;s connection. While the user&#8217;s connection is waiting other client connections may observe the committed data.</p>
<p>This still has useful properties. For example, it rate limits a busy client to prevent it from making the master get far ahead of the slaves. N busy clients can make the master get N transactions ahead of the slaves. But they can&#8217;t make the master get N+1 transactions ahead of the slaves &#8212; a client can create at most 1 transaction on the master that has not been ACKd by a slave.</p>
<p>This is a step in the right direction but it isn&#8217;t sync replication. Fortunately, MySQL hired 2 experts on sync replication into the MySQL replication team so we may get some interesting features in the future.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Peter Zaitsev</title>
		<link>http://feedblog.org/2008/12/17/on-write-caching-controllers-and-distributed-database-failure-models/#comment-2776</link>
		<dc:creator>Peter Zaitsev</dc:creator>
		<pubDate>Thu, 18 Dec 2008 04:08:44 +0000</pubDate>
		<guid isPermaLink="false">http://feedblog.org/?p=1802#comment-2776</guid>
		<description>Kevin,

BBU will be faster with standard 1Gbit ethernet,  Usually you can do 10000-20000 fsync from single thread to the NVRAM with  ethernet it is lower.

Speaking about price I think Google uses BBU for MySQL Boxes but not for Search boxes which is where crap hardware is used.

If you&#039;re looking at the crappy box the RAID is often expensive, in particular because vendors charge large premium on it.  If you have  better box the added cost is much lower.  For example  getting $5000 Dell (with 8 HDD etc) the cost factor of RAID controller in price will be $200 or so, which is closer to 4% than 20%

Though it is a bit hard to judge as cost on the component level may be really different from whole system. And if you&#039;re dealing with large volumes it is the component cost becomes important to see how price can you get from a vendor.</description>
		<content:encoded><![CDATA[<p>Kevin,</p>
<p>BBU will be faster with standard 1Gbit ethernet,  Usually you can do 10000-20000 fsync from single thread to the NVRAM with  ethernet it is lower.</p>
<p>Speaking about price I think Google uses BBU for MySQL Boxes but not for Search boxes which is where crap hardware is used.</p>
<p>If you&#8217;re looking at the crappy box the RAID is often expensive, in particular because vendors charge large premium on it.  If you have  better box the added cost is much lower.  For example  getting $5000 Dell (with 8 HDD etc) the cost factor of RAID controller in price will be $200 or so, which is closer to 4% than 20%</p>
<p>Though it is a bit hard to judge as cost on the component level may be really different from whole system. And if you&#8217;re dealing with large volumes it is the component cost becomes important to see how price can you get from a vendor.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
