Tuesday, April 30, 2013

On bitcoin data spam, and evil data

What happens if somebody puts evil data in the blockchain?  What responses are available?

It is a truly awful situation, and difficult to address.

What happened?

The easiest way to explain what happened here is through analogy. Imagine if someone picked a penny stock on the NYSE and made a sequence of apparently pointless trades. Then they announced that the prices of their stock trades actually encoded links to some "evil" websites. You know, maybe $0.01 means "a" and $0.02 means "b", etc. Stock market tickers are public, lots of places archive that data, so now lots of people have "links to evil data". Except really they don't. What they have is a list of stock trades. You'd need special software to turn that into some other kind of data.

This is what someone has done with Bitcoin. They sent a series of monetary transactions that did not actually represent real trades, and then announced that with a special program you could turn them back into some text. That text then contains links to, well, I don't actually know what because I haven't looked. But let's assume it's bad stuff.

What solutions are available?  Software update?

The answer is very complex, with implications that travel to the heart of bitcoin's value.

Sending bitcoins requires two pieces of data: a bitcoin address, and an amount (number of bitcoins).  There is no "comments field" or anything of that nature.  A bitcoin address is just a random 20-byte piece of data.  Normally those 20 bytes are derived from the RIPEMD160 and SHA256 algorithms, but a valid 20 bytes cannot be distinguished from an invalid 20 bytes.  Therefore, if you are willing to waste money -- albeit very small fractions like 0.00000001 bitcoins -- by sending that money to invalid bitcoin addresses, you essentially have created a channel for random data transmission.

The bitcoin blockchain is in one sense a massively replicated ~7GB database that stores data for all eternity.  There remains the open question of what happens if somebody dumps data into the blockchain, unrelated to currency.  Maybe a government finds that data illegal.  Smart people argue the legal theory mens rea and similar mitigating factors are applicable.  But it remains an unknown.  The vast majority of people are burdened with this awful data they don't care about, simply to use the bitcoin payment system they do care about.

There are many conflicting motives and incentives (very Brave New War-ish):

  • Anarchist activists want to publish this information, to force authorities to act (or not) when this illegal data is published.
  • Bitcoin activists want to publish this information, to force developers (us) to address The Filter Issue (see below).
  • Some people see more value in bitcoin as "eternity data storage", if expensive and inefficient, than bitcoin as a currency.
  • It is, quite literally, impossible to prevent use of bitcoin for data transmission.  It is a purely digital currency.  Who can say which digits are "evil" or "good", allowed or disallowed?  You can detect certain patterns, and possibly filter those.
  • Many bitcoin users are using bitcoin for its intended purpose, as currency transfer, and dislike carrying the costs for these data transmission uses.
  • As this carrying-data issue rears its head, it increases the costs for anyone running a P2P node on the all-volunteer bitcoin P2P network.  This shrinks the total number of bitcoin P2P nodes.
  • As such, due to both legal and resource-usage issues, "data spam" has long been theorized as an attack vector.


The "Filter Issue"

There are very large ramifications to filtering out transactions, even ones that are obviously data spam.

Fungability: currently, all bitcoins have the same value.  My 1.0 BTC and your 1.0 BTC are equivalent in value.  Once you start filtering transactions, you are injecting policy-based censorship into the mix. Some bitcoins are accepted by all, some bitcoins are only accepted by a few.  A value of a bitcoin itself becomes a product of its ancestry.  If this policy is implemented, perhaps by court order to a bitcoin mining pool, it could lead chain forks, where i.e. bitcoin users in the United States see a different set of spendable bitcoins than users outside the US.  That would be a disaster for bitcoin.

It is widely speculated, based on common forum comments in the crypto-anarchist community, that this current round of data spam is intended to force bitcoin users, developers and governments of the world to take action to censor -- or not -- certain bitcoin transactions.  Trying to force the issue, to establish a precedent one way or the other.  Or, more pessimistically, a party could be simply trying to shut down bitcoin.

The bitcoin community is very staunchly anti-censorship, but if data spam were to threaten the life of bitcoin, I imagine ideology-neutral "it looks like data, not currency" filtering might appear.  Bitcoin is ultimately a product of voting -- you vote by choosing which software version and software ruleset to download.

The users can always vote data spam off the island...  but will they? Is data transmission a valid use of bitcoin?  The users themselves choose the definition of "valid."

What solutions could be deployed right now?

Currently being discussed is avoiding the relay of economically worthless (under $0.0001 dollars, say) bitcoin transactions.  Thus, higher transaction fees would be required to send out lots of data, directly raising the cost.


See Gregory Maxwell's post, "to prevent arbitrary data storage in txouts — The Ultimate Solution" for a proposed solution.