Tuesday, December 16, 2014

Open development processes and reddit kerkluffles

It can be useful to review open source development processes from time to time.  This reddit thread[1] serves use both as a case study, and also a moment of OSS process introduction for newbies.
[1] http://www.reddit.com/r/Bitcoin/comments/2pd0zy/peter_todd_is_saying_shoddy_development_on_v010/

Dirty Laundry

When building businesses or commercial software projects, outsiders typically hear little about the internals of project development.  The public only hears what the companies release, which is prepped and polished. Internal disagreements, schedule slips, engineer fistfights are all unseen.

Open source development is the opposite.  The goal is radical transparency.  Inevitably there is private chatter (0day bugs etc.), but the default is openness.  This means that is it normal practice to "air dirty laundry in public."  Engineers will disagree, sometimes quietly, sometimes loudly, sometimes rudely and with ad hominem attacks.  On the Internet, there is a pile-on effect, where informed and uninformed supporters add their 0.02 BTC.

Competing interests cloud the issues further.  Engineers are typically employed by an organization, as a technology matures.  Those organizations have different strategies and motivations.  These organizations will sponsor work they find beneficial.  Sometimes those orgs are non-profit foundations, sometimes for-profit corporations.  Sometimes that work is maintenance ("keep it running"), sometimes that work is developing new, competitive features that company feels will give it a better market position.  In a transparent development environment, all parties are hyperaware of these competing interests.  Internet natterers painstakingly document and repeat every conspiracy theory about Bitcoin Foundation, Blockstream, BitPay, various altcoin developers, and more as a result of these competing interests.

Bitcoin and altcoin development adds an interesting new dimension.  Sometimes engineers have a more direct conflict of interest, in that the technology they are developing is also potentially their road to instant $millions.  Investors, amateur and professional, have direct stakes in a certain coin or coin technology.  Engineers also have an emotional stake in technology they design and nurture.  This results in incentives where supporters of a non-bitcoin technology work very hard to thump bitcoin.  And vice versa.  Even inside bitcoin, you see "tree chains vs. side chains" threads of a similar stripe.  This can lead to a very skewed debate.

That should not distract from the engineering discussion.  Starting from first principles, Assume Good Faith[2].  Most engineers in open source tend to mean what they say.  Typically they speak for themselves first, and their employers value that engineer's freedom of opinion.  Pay attention to the engineers actually working on the technology, and less attention to the noise bubbling around the Internet like the kindergarten game of grapevine.
[2] http://en.wikipedia.org/wiki/Wikipedia:Assume_good_faith

Being open and transparent means engineering disagreements happen in public.  This is normal.  Open source engineers live an aquarium life[3].
[3] https://www.youtube.com/watch?v=QKe-aO44R7k

What the fork?

In this case, a tweet suggests consensus bug risks, which reddit account "treeorsidechains" hyperbolizes into a dramatic headline[1].  However, the headline would seem to be the opposite of the truth.  Several changes were merged during 0.10 development which move snippets of source code into new files and new sub-directories.  The general direction of this work is creating a "libconsensus" library that carefully encapsulates consensus code in a manner usable by external projects.  This is a good thing.

The development was performed quite responsibly:  Multiple developers would verify each cosmetic change, ensuring no behavior changes had been accidentally (or maliciously!) introduced.  Each pull request receives a full multi-platform build + automated testing, over and above individual dev testing.  Comparisons at the assembly language level were sometimes made in critical areas, to ensure zero before-and-after change.  Each transformation gets the Bitcoin Core codebase to a more sustainable, more reusable state.

Certainly zero-change is the most conservative approach. Strictly speaking, that has the lowest consensus risk.  But that is a short term mentality.  Both Bitcoin Core and the larger ecosystem will benefit when the "hairball" pile of source code is cleaned up.  Progress has been made on that front in the past 2 years, and continues.   Long term, combined with the "libconsensus" work, that leads to less community-wide risk.

The key is balance.  Continue software engineering practices -- like those just mentioned above -- that enable change with least consensus risk.  Part of those practices is review at each step of the development process:  social media thought bubble, mailing list post, pull request, git merge, pre-release & release.  It probably seems chaotic at times.  In effect, git[hub] and the Internet enable a dynamic system of review and feedback, where each stage provides a check-and-balance for bad ideas and bad software changes.  It's a human process, designed to acknowledge and handle that human engineers are fallible and might make mistakes (or be coerced/under duress!).  History and field experience will be the ultimate judge, but I think Bitcoin Core is doing good on this score, all things considered.

At the end of the day, while no change is without risk, version 0.10 work was done with attention to consensus risk at multiple levels (not just short term).

Technical and social debt

Working on the Linux kernel was an interesting experience that combined git-driven parallel development and a similar source code hairball.  One of the things that quickly became apparent is that cosmetic patches, especially code movement, was hugely disruptive.  Some even termed it anti-social.  To understand why, it is important to consider how modern software changes are developed:

Developers work in parallel on their personal computers to develop XYZ change, then submit their change "upstream" as a github pull request.  Then time passes.  If code movement and refactoring changes are accepted upstream before XYZ, then the developer is forced to update XYZ -- typically trivial fixes, re-review XYZ, and re-test XYZ to ensure it remains in a known-working state.

Seemingly cosmetic changes such as code movement have a ripple effect on participating developers, and wider developer community.  Every developer who is not immediately merged upstream must bear the costs of updating their unmerged work.

Normally, this is expected.  Encouraging developers to build on top of "upstream" produces virtuous cycles.

However, a constant stream of code movement and cosmetic changes may produce a constant stream of disruption to developers working on non-trivial features that take a bit longer to develop before going upstream.  Trivial changes become encouraged, and non-trivial changes face a binary choice of (a) be merged immediately or (b) bear added re-base, re-view, re-test costs.

Taken over a timescale of months, I argue that a steady stream of cosmetic code movement changes serves as a disincentive to developers working with upstream.  Each upstream breakage has a ripple effect to all developers downstream, and imposes some added chance of newly introduced bugs on downstream developers.  I'll call this "social debt", a sort of technical debt[4] for developers.
[4] http://en.wikipedia.org/wiki/Technical_debt

As mentioned above, the libconsensus and code movement work is a net gain.  The codebase needs cleaning up.  Each change however incurs a little bit of social debt.  Life is a little bit harder on people trying to get work into the tree.  Developers are a little bit more discouraged at the busy-work they must perform.  Non-trivial pull requests take a little bit longer to approve, because they take a little bit more work to rebase (again).

A steady flow of code movement and cosmetic breakage into the tree may be a net gain, but it also incurs a lot of social debt.  In such situations, developers find that tested, working out-of-tree code repeatedly stops working during the process of trying to get that work in-tree.  Taken over time, it discourages working on the tree.  It is rational to sit back, not work on the tree, let the breakage stop, and then pick up the pieces.

Paradox Unwound

Bitcoin Core, then, is pulled in opposite directions by a familiar problem.  It is generally agreed that the codebase needs further refactoring.  That's not just isolated engineer nit-picking.  However, for non-trivial projects, refactoring is always anti-social in the short term.  It impacts projects other than your own, projects you don't even know about. One change causes work for N developers.  Given these twin opposing goals, the key, as ever, is finding the right balance.

Much like "feature freeze" in other software projects, developing a policy that opens and closes windows for code movement and major disruptive changes seems prudent.  One week of code movement & cosmetics followed by 3 weeks without, for example.  Part of open source parallel development is social signalling:  Signal to developers when certain changes are favored or not, then trust they can handle the rest from there.

While recent code movement commits themselves are individually ACK-worthy, professionally executed and moving towards a positive goal, I think the project could strike a better balance when it comes to disruptive cosmetic changes, a balance that better encourages developers to work on more involved Bitcoin Core projects.

Friday, December 12, 2014

Survey of largest Internet companies, and bitcoin

Status report: Internet companies & bitcoin

Considering the recent news of Microsoft accepting bitcoin as payment for some digital goods, it seemed worthwhile to make a quick status check.  Wikipedia helpfully supplies a list of the largest Internet companies.  Let's take that list on a case-by-case basis.

Amazon.  As I blogged earlier, it seemed likely Amazon will be a slower mover on bitcoin.

Google.  Internally, there is factional interest.  Some internal fans, some internal critics.  Externally, very little.  Eric Schmidt has said good things about bitcoin.  Core developer Mike Hearn worked on bitcoin projects with the approval of senior management.

eBayActively considering bitcoin integration.  Produced an explainer video on bitcoin.

Tencent. Nothing known.  Historical note:  Tencent, QQ, and bitcoin (CNN)

Alibaba.  Seemingly hostile, based on government pressure.  "Alibaba bans Bitcoin"

Facebook.  Nothing known.

Rakuten.  US subsidiary accepts bitcoin.

Priceline.  Nothing known.  Given that competitors Expedia and CheapAir accept bitcoin, it seems like momentum is building in that industry.

Baidu.  Presumed bitcoin-positive.  Briefly flirted with bitcoin, before government stepped in.

Yahoo.  Nothing known at the corporate level.  Their finance product displays bitcoin prices.

Salesforce.  Nothing known.  Third parties such as AltInvoice provide bitcoin integration through plugins.

Yandex. Presumed bitcoin-positive.  They launched a bitcoin conversion tool before their competitors.  Some critics suggest Yandex Money competes with bitcoin.

By my count, 6 out of 12 of the largest Internet companies have publicly indicated some level of involvement with bitcoin.

Similar lists may be produced by looking at the largest technology companies, and excluding electronics manufacturers.  Microsoft and IBM clearly top the list, both moving publicly into bitcoin and blockchain technology.

Wednesday, November 5, 2014

Prediction: GOP in 2014, Democrat WH in 2016

Consider today's US mid-term election results neutrally:

  • When one party controls both houses of Congress, that party will become giddy with power and over-reach.
  • When one party reaches minority status in both houses, that party resorts to tactics it previously condemned ("nuclear option").
  • It gets ugly when one party controls Congress, and another party controls the White House.

The most recent example is Bush 43 + Democrats, but that is only the latest example.

Typical results from this sort of situation:
  • An orgy of hearings.
  • A raft of long-delayed "red meat for the base" bills will be passed in short order.
  • Political theatre raised one level:  Congress will pass bills it knows the President will veto (and knows cannot achieve a veto override).
  • A 2-house minority party becomes the obstructionist Party Of No.
A party flush with power simply cannot resist over-reach.  Democrats and Republicans both have proven this true time and again.

As such, we must consider timing.  GOP won the mid-terms, giving them two years to over-reach before the 2016 general election.  Voters will be tired of the over-reach, and the pendulum will swing back.

Predicted result:  Democrats take the White House in 2016.

If the 2014 mid-term elections had been the 2016 election, we would be looking at a full sweep, with GOP in House, Senate and White House.

Losing could be the best thing Democrats did for themselves in 2014.

P.S. Secondary prediction:  ACA will not be repealed.  ACA repeal bill will be voted upon, but will not make it to the President's desk.

Monday, July 7, 2014

What should the news write about, today?

I have always wanted to be journalist.  As a youth, it seemed terribly entertaining, jet-setting around the world chasing stories by day.  Living like Ernest Hemingway by night.  Taking [some of the first in the world] web monkey jobs at Georgia Tech's Technique and CNN gave me the opportunity to learn the news business from the inside.

Fundamentally, from an engineering perspective, the news business is out of sync with actual news events.

News events happen in real time.  There are bursts of information.  Clusters of events happen within a short span of time.  There is more news on some days, less news on others.

The news business demands content, visits, links, shares, likes, follows, trends.  Assembly line production demands regularized schedules; deadlines.  Deadlines imply a story must be written, even if there is no story to write.

Today's news business is incredibly cut throat.  Old dinosaurs are thrashing about.  Young upstarts are too.  My two young children only know of newspapers from children's storybooks, and icons on their Android tablets.  Classified ads, once a traditional revenue driver, have gone the way of the Internet.  Many "newspapers" are largely point-and-click template affairs, with a little local reporting thrown in.  Robots auto-post every press release.  Content stealing abounds.

All these inherent barriers exist for those brave few journalists left on the robot battlefield.  As usual with any industry that is being automated, the key to staying ahead is doing things that humans are good at, but robots not: creativity, inventiveness, curiosity, detective work.  Avoiding herds, cargo cults, bike shedding, conventional wisdom.

In my ideal world, news sites would post more news, look and feel a bit different, on days and weeks where there is a lot of news.  On slow news days, the site/app should feel like it's a slow news day.

The "it bleeds, it leads" pattern is worn out, and must be thrown in the rubbish.

Every day, every week, when a reporter is met with the challenge of meeting a deadline to feed the content beast, the primary question should be:  What recent trends/events impact the biggest percentage of your audience?

Pick any "mainstream" news site.  How many stories impact those beyond the immediate protagonists/antagonists/victims/authorities involved?

The news business, by its very nature, obscures and disincentivizes reporting on deep, impactful, and probably boring trends shaping our lives.  The biggest changes that happen to the human race are largely apparent in hindsight, looking back over the decades or hundreds or thousands of years.

The good stories are always the hardest to find.  Every "news maker" has the incentive to puff their accomplishments, and hide their failures.  Scientists have the same incentives (sadly):  Science needs negative feedback ("this theory/test failed!") yet there are few incentives to publish that.  Reporters must seek and tell the untold story, not the story everyone already knows.

Journalists of 20+ years ago were information gateways.  Selecting which bit of information to publish, or not, was a key editorial power.  Now, with the Internet, the practice is "publish all, sift later."  Today's journalists must reinvent themselves as modern detectives, versus the information gateways and "filters" of past decades.

Friday, June 13, 2014

Bitcoin and 51% mining power

Meta: This doesn't cover all incentives. More a high level reminder for new folks.

Tweet: #bitcoin mining market under-studied & interesting. Where else
can 50% market leaders disappear, and market adjusts in real time?


Bitcoin mining pools are entities that serve to aggregate the security services provided by bitcoin mining hardware owned by individuals all over the world.  These mining pools execute bitcoin monetary policy -- they are the key network entities that select transactions to be included in The Official Timeline of Bitcoin Transactions (the blockchain).

The companies and individuals that own bitcoin mining hardware form a second tier in the market.  These miners choose an aggregator (mining pool) to which they provide computing services, in exchange for bitcoin payments.

The unique and interesting bit is that these second tiers miners all employ software that auto-switches between mining pools based on a variety of economic factors:  pool monetary policy choices, profitability and fee structure of the pool, technical availability of the pool, collective strength of the pool (size of the aggregation) versus other pools, etc.

Thus, a large and popular mining pool, dominating the market with >50% marketshare, may disappear in an instant.  Or another pool may be more profitable.  Second tier miners all employ software that switches between first tier aggregators in real time.  Low economic friction vis a vis market entry implies that market leadership follows three trends:
  1. Network effects generate large marketshares rapidly.
  2. Low economic friction (low cost of entry) implies market leadership changes frequently.  Every 12 months or so.
  3. The market is resilient against failure of market leaders, even those with > 50% marketshare.
It is natural and expected that miners will see a pool grow large, and switch away to other pools.  ETA:  Standard recommendation, use P2Pool.

Finally, remember that mining pools and miners are paid with tokens within the system -- bitcoins.  It is always in a miner's interest that bitcoins maintain their value.  Any behavior that harms the network as a whole will directly impact a large miner's income stream.  The larger the miner, the larger the impact.

Thursday, June 5, 2014

Why I will not be joining the NSA protest

(copied from a twitter rant)

Some random points about the NSA and global surveillance:

Today, in 2014, paying low volume retail prices, it costs a DIY stalker $400/month to track every human going through a single street corner.  For a large government at mass scale, their costs will be 1/100th or 1/1000th of that, or lower.  The costs of tracking everyone on the planet is falling through the floor.

Don't blame the NSA for being the first to buy a hammer off a store shelf.

Most network engineers presumed the Internet has been tapped for all its life. The Internet was not built to be secure. The original Internet protocols all sent passwords and other sensitive data over the network in plaintext. To this day, the most popular email protocol sends email across the Internet in plaintext, ensuring at least 10 entities (routers) have a copy of your email. It was trivial for any university student to snoop your email.

The NSA's global surveillance is a commentary on the future of tech for everyone. What the NSA has today, other countries have tomorrow, everyone has next year.

Further, we are presented with the obvious paradox:  Law enforcement (LEA) needs to follow criminals, whereever they go. National defense needs to follow attackers around the world. If you build a space away from LEA, criminals go there, and LEA is tasked to follow.

Nevertheless...  Freedom of [physical] assocation, perhaps even freedom of thought is threatened by global surveillance.  Today's global surveillance is a natural consequence of technology, not the fault of the NSA.

We now live in a world where all authors, thinkers, activists, politicians, judges, attorneys are automatically recorded.

The movement and communications of all "wired" citizens on Earth are tracked. Relevant factor is how tech advances to permit NSA to "remember" ever higher percentage of daily data. Data firehose is staggeringly huge, even for NSA.

No matter the layers of process protections and personal honor defending such data, access to the movements and communications of everyone will be abused for political or petty reasons.

Consider NAACP v. Alabama in the context of a universally tracked digital world.

Globally, we must have a conversation about practical freedom and privacy limits to be placed on data collected without our knowledge.  This is much bigger than the NSA, and we should not get distracted from the bigger picture of global surveillance by breathing fire at the first organization that makes use of well known techniques and technologies.

My personal recommendation are laws in every jurisdiction regarding privacy, data retention, forced data expiration (deletion), decreasing use of secret evidence, and eventual notification of investigation targets.  We must avoid the "pre-crime" trap, where predictive models lock society into a straightjacket based on word or thought alone. Citizens must be able to spout off. Youth must be allowed to screw up and be forgiven by society, rather than curse a person with a minor youthful transgression for the rest of their lives.

We must encourage the government to be transparent, while protecting the privacy of our citizens in a global, internetworked society.

Wednesday, May 14, 2014

Bitcoin and the kernel-in-a-kernel security sandbox problem

When considering sandboxes and security jails, the problem space is interesting.  There is an increasingly common pattern. I call it "kernel-in-a-kernel." The problem is not really sandboxing untrusted code, but more fundamentally, sandboxing untrusted behavior.

Bitcoin sees this acutely: bitcoind manages the bitcoin P2P network. P2P network is flood-fill a la Usenet, and anyone may connect to any node. Built-in DoS protections are a must, but these are inevitably hueristics which duct-tape one problem area, while leaving another open to algorithmic attacks ("this P2P command runs an expensive query, that impacts other connected nodes").

One comprehensive solution is accounting. Account for the various resources being used by each connected party (CPU, RAM, disk b/w, ...) and verify that some connections do not starve other connections of resources. This solution is a sandbox that essentially becomes a kernel unto itself, as the solution is not merely preventing sandbox jailbreaks but at a higher level limiting algorithmic jailbreaks.

Think about the high level economics of any computing situation. You have limited resources, and various actors have valid and malicious needs of those resources.  What is the best practical model for balancing a set of limited resources, given potential malicious or buggy/haywire users of these resources?