Archive for November, 2006
Given by Jeff Dean (Google) at the given University of Washington on Oct 18, 2005 (video, slides)
BigTable is a distributed storage system for managing structured data that is designed to scale to a very large size.
Interesting quotes from presentation:
- Scale is too big for commercial databases, they can’t also run on a cheap clustered servers.
- Features:
- Distributed multy-level map
- Fault tolerant, persistant
- Scalabale (thousands of servers, megabytes of in-memory data, petabyte of disk data, millions/sec of r/w, efficient scans)
- Self-managing (servers can be added/removed dynamically, servers adjust to load imbalance)
- Largest bigtable cells (data collections) ~200TB on over thousands of servers
- Built upon:
- miltidimentional – row (e.g. url), col (attribute) = cell, inside cell time-based values for the cell.
- related rows (tablets) are located on the same machines for better performance
- load balancing moves tablets around
- tablets are replicated across multiple machines
- requests like “get recent X values” are possible
- columns can be configured to retain only X most recent entries
- locality groups to partition tablets
- has huge logging problems
- a lot of opportunities for compression – time-shifted data is similar, many values are the same. Using BMDiff (dictionary-based compression) – encode ~100MB/s, decode ~1000MB/s; Zippy (LZW-like) – 179MB/s, 409MB/s
- Compression experiment results: web pages compress at 9.2%, links at 13.2%, anchors at 12.7%
Update: Luke Baker made screen shots from video with all slides (not really in the right order).
No Comments »
Google TechTalks presentation by Narayanan Shivakumar, Google Inc. at (video) on May 31, 2006
Interesrting presentation outlining major parts of infrastructure used by Google to run all it’s projects on.
These are main infrastructure system used at Google:
P.S. slides are not seen that good unfortunately so you’ll have to listen patiently.
No Comments »
Google TechTalk by Mark Shuttleworth on November 9, 2006 at Ubuntu Linux Development Summit. (video)
.
Not really about technologies, but about organisational infrastructure of Open Source software project.
Interesting notes (not necessarily quotes):
- Open Source is new economics, culture, new way of producing software (Ubuntu is different from RedHat and Novell)
- Open community – because company is distributed, most of the employees work from home.
- Launchpad.net – community software development site knowing about other Open Source prohect developments (Apache, for example, being important part of Ununtu distribution), bugs, translations, source control, etc.
- Desktop will kick in when Dell will choose chip vendor (from vendors with the same price) based on their Linux drivers.
- “Kernel guys live on the tip – they’re not concerned about older systems.” (meaning that it’s fine for servers but bad for desktops)
- Check Google trends for Ubuntu
No Comments »
Presentation about how PHP is used at Yahoo!, by Michael J. Radwin given at MySQL User Conference on April 26, 2006 (PPT slides) (previously at Zend/PHP Conference – PDF version).

Some quotes:
- May 2002: yScript -> PHP
- Why we picked PHP
- Designed for web scripting
- High performance
- Large, Open Source community
- Documentation, easy to hire developers
- “Code-in-HTML” paradigm
- Integration, libraries, extensibility
- Tools: IDE, debugger, profiler
- ./configure –disable-all
- Security: open_basedir, allow_url_fopen = Off, display_errors = Off, safe_mode = Off; input_filter hook
- Performance: Opcode Caches, PHP Extensions in C++
- Globalization: PHP Unicode (2006)
Historical trail of talks about PHP at Yahoo! (worth reading too):
No Comments »
Presentation by Michael J. Radwin given at OSCON 2006 on July 27, 2006 (previously given at ApacheCon 2005).
Slides in HTML, PDF, PPT.

Interesting quotes:
- Still running Apache 1.3, actively porting to Apache 2.2 (2006)
- What’s wrong with threads?
- too hard for most programmers to use
- even for experts, development is painful
- Custom log files format
- Signal-free log rotation
- Bandwidth reduction
- Smaller 30x bodies
- Custom gzip
- No need for StartServers, MaxSpareServers, MinSpareServers – just MaxClients (constant pool size)
- Accept Filtering on FreeBSD
- SendBufferSize 229376, NO_LINGCLOSE – don’t wait for the client to read the response
- YahooHostHtmlComment
- SSL Acceleration cards, stunnel
- ysar
No Comments »
|