Archive for the “BigTable” Category


Presentation about Google’s internal systems by independent researcher Toby DiPasquale given at Philadelphia LUG on August 2nd, 2006 (slides)

Google Internals

Comments No Comments »

Given by Jeff Dean (Google) at the given University of Washington on Oct 18, 2005 (video, slides)

BigTable is a distributed storage system for managing structured data that is designed to scale to a very large size.

Interesting quotes from presentation:

  • Scale is too big for commercial databases, they can’t also run on a cheap clustered servers.
  • Features:
    • Distributed multy-level map
    • Fault tolerant, persistant
    • Scalabale (thousands of servers, megabytes of in-memory data, petabyte of disk data, millions/sec of r/w, efficient scans)
    • Self-managing (servers can be added/removed dynamically, servers adjust to load imbalance)
  • Largest bigtable cells (data collections) ~200TB on over thousands of servers
  • Built upon:
  • miltidimentional - row (e.g. url), col (attribute) = cell, inside cell time-based values for the cell.
  • related rows (tablets) are located on the same machines for better performance
  • load balancing moves tablets around
  • tablets are replicated across multiple machines
  • requests like “get recent X values” are possible
  • columns can be configured to retain only X most recent entries
  • locality groups to partition tablets
  • has huge logging problems
  • a lot of opportunities for compression - time-shifted data is similar, many values are the same. Using BMDiff (dictionary-based compression) - encode ~100MB/s, decode ~1000MB/s; Zippy (LZW-like) - 179MB/s, 409MB/s
  • Compression experiment results: web pages compress at 9.2%, links at 13.2%, anchors at 12.7%

Update: Luke Baker made screen shots from video with all slides (not really in the right order).

Comments No Comments »

Google TechTalks presentation by Narayanan Shivakumar, Google Inc. at (video) on May 31, 2006

Interesrting presentation outlining major parts of infrastructure used by Google to run all it’s projects on.

These are main infrastructure system used at Google:

P.S. slides are not seen that good unfortunately so you’ll have to listen patiently.

Comments No Comments »