2008 au tim wikipedia
TRANSCRIPT
-
8/6/2019 2008 Au Tim Wikipedia
1/25
Wikimedia's Squid NetworkAugust 2008
-
8/6/2019 2008 Au Tim Wikipedia
2/25
Wikimedia Foundation
Non-profit organisation
Registered 501(c)(3) charity
Funded entirely by donations
Head office in San Francisco
Operates Wikipedia and ~10 other websites
21 staff members, including 6 tech staff
-
8/6/2019 2008 Au Tim Wikipedia
3/25
Me
Tim Starling
Working for Wikipedia (later Wikimedia) since2002
Paid for it since 2006
Multiple roles: developer and systemadministrator
mailto:[email protected]:[email protected] -
8/6/2019 2008 Au Tim Wikipedia
4/25
Wikipedia
Ranked 7-8 on Alexa with adaily reach of 9%
250 languages (written from
scratch in each language,not translated)
Far bigger than any otherencyclopedia
-
8/6/2019 2008 Au Tim Wikipedia
5/25
Wikipedia
It's a Wiki (from the Hawaiian word quick)
As soon as an edit is performed, it is instantlyvisible to everyone
Cache Control: private, s maxage=0, max age=0,
must revalidate
-
8/6/2019 2008 Au Tim Wikipedia
6/25
Operations
Squid reports cache size of ~500 GB text, ~770GB images
Text backend: PHP/MySQL
Images backend: lighttpd/NFS
Terabytes of old revisions (we keep them all)
50,000 req/s at peak
-
8/6/2019 2008 Au Tim Wikipedia
7/25
Load divided by geographic DNS
Tampa
DB
Amsterdam
SeoulOperations
-
8/6/2019 2008 Au Tim Wikipedia
8/25
Tampa, Florida
Where Jimmy Wales used to live
Backend
24 image squids
25 text squids
We pay for bandwidth
-
8/6/2019 2008 Au Tim Wikipedia
9/25
Amsterdam, The Netherlands
14 image squids
15 text squids
Peering on AMSIX
Hosting and transitdonated by Kennisnet
-
8/6/2019 2008 Au Tim Wikipedia
10/25
Seoul, South Korea
9 image squids
8 text squids
Hosting donated byYahoo! Korea
-
8/6/2019 2008 Au Tim Wikipedia
11/25
Squid network
Each squid server has two instances of squidrunning
root@sq1:~# ps -e --forest | grep squid
4890 ? 00:00:00 squid4893 ? 1-11:03:14 \_ squid4919 ? 00:00:00 squid-frontend4921 ? 2-01:07:02 \_ squid-frontend
LVSLoad balancer
SquidFrontend
SquidCache
-
8/6/2019 2008 Au Tim Wikipedia
12/25
Squid network
LVS
Simple load balancing
Frontend squid
Small memory-only cache
Client ACLs
Selects a cache squid using CARP (URL hashing)
Cache squid Memory and disk cache
Forwards to Tampa squid cluster, or to backend
-
8/6/2019 2008 Au Tim Wikipedia
13/25
Caching strategy
Backend gives a licence to cache
Cache-control: s-maxage=2678400, must-revalidate, max-age=0
Squid replaces the header to suppress externalcaches
header_access Cache-Control allow tiertwoheader_replace Cache-Control private, s-maxage=0, max-age=0,
must-revalidate
Backend sends an HTCP CLR message to allsubscribed squids when an object changes
-
8/6/2019 2008 Au Tim Wikipedia
14/25
Caching strategy
Logged-in page views are not cached in squid
Vary: Accept-Encoding, Cookie
Images and CSS are cached for everyone, no
Vary header
-
8/6/2019 2008 Au Tim Wikipedia
15/25
HTCP CLR
Backend Tampa squids
Tampaudpmcast.py
Seouludpmcast.py
Amsterdamudpmcast.py
Amsterdamsquids
Seoul squids
Multicast
Unicast Multicast
MulticastUnicast
-
8/6/2019 2008 Au Tim Wikipedia
16/25
-
8/6/2019 2008 Au Tim Wikipedia
17/25
Australian squid cluster?
Our ideal Australian squid cluster would:
Be a single cluster of a few dedicated servers
Give us root access
Serve the whole region (Australia, New Zealand,Indonesia)
Expected traffic ~100 Mbps outbound peak
-
8/6/2019 2008 Au Tim Wikipedia
18/25
Patches
Make HTCP CLR work when you use Varyheaders
Make header_replace only act on response
headers UDP logging
X-Vary-Options
All open source (in debs/squid)
-
8/6/2019 2008 Au Tim Wikipedia
19/25
UDP logging
Squid access logs are sent by UDP to anaggregation host in Seoul
The log stream is processed by various scripts
in real time Only a 1/1000 sample is logged to disk
The whole system is openly documented and
open source:https://wikitech.leuksman.com/view/Squid_logging
-
8/6/2019 2008 Au Tim Wikipedia
20/25
X-Vary-Options
The Vary header is a blunt instrument
"Vary: Cookie is not a good way to detectlogins, when you have JS-only cookies
"Vary: Accept-Encoding is not a good way todetect gzip support
X-Vary-Options: Accept-Encoding;list-contains=gzip,Cookie;string-contains=enwikiToken;string-contains=enwikiLoggedOut;string-contains=enwiki_session;string-contains=centralauth_Token;string-contains=centralauth_Session;string-contains=centralauth_LoggedOut
-
8/6/2019 2008 Au Tim Wikipedia
21/25
X-Vary-Options
The X-Vary-Options patch gives fine-grainedcontrol over how squid constructs Vary records
The syntax is extensible
-
8/6/2019 2008 Au Tim Wikipedia
22/25
Configurator
PHP script to generate configuration files forevery squid instance
'apaches' => array('pmtpa' => array(
'test.wikipedia.org' => 'srv35.pmtpa.wmnet','=wap_domains' => 'yongle.wikimedia.org','ls2.wikimedia.org' => 'srv77.pmtpa.wmnet','whygive.wikimedia.org' => 'isidore.wikimedia.org','blog.wikimedia.org' => 'isidore.wikimedia.org','=thumb_php' => 'rendering.pmtpa.wmnet',
'apaches.pmtpa.wmnet', # LVS),
),
-
8/6/2019 2008 Au Tim Wikipedia
23/25
Configurator
# srv35.pmtpa.wmnetcache_peer 10.0.2.35 parent 80 3130 originserver no-query
connect-timeout=5 login=PASScache_peer_access 10.0.2.35 deny wap_domainscache_peer_access 10.0.2.35 deny ls2_wikimedia_orgcache_peer_access 10.0.2.35 deny whygive_wikimedia_orgcache_peer_access 10.0.2.35 deny blog_wikimedia_org
cache_peer_access 10.0.2.35 deny thumb_phpcache_peer_access 10.0.2.35 allow test_wikipedia_orgcache_peer_access 10.0.2.35 deny all
PHP script to generate configuration files forevery squid instance
C f
-
8/6/2019 2008 Au Tim Wikipedia
24/25
Configurator
Generates complex ACL rules
Supports origin servers behind multiple cacheclusters
Can configure things like cache_mem, per-server and per-cluster
Perl deployment script pushes the configuration
files out by ssh and HUPs squid Not public (but might be some day)
Q ti ?
-
8/6/2019 2008 Au Tim Wikipedia
25/25
Questions?
Image credits:
Wikimedia logo, Wikipedia logo: Copyright Wikimedia
Foundation, all rights reserved Wiki Wiki bus: Andrew Laing, cc-by-sa-2.0
Squid: from http://www.squid-cache.org/Artwork/ cc-by-nc-sa
Florida locator: Huebi, cc-by
Netherlands locator: Quizmodo, PD
South Korea locator: Vardion, GFDL