the great migration by baruch sadogursky
TRANSCRIPT
THE GREAT MIGRATION@jbaruch
?
a.k.a. @jbaruch
JFrog
Artifactory
Bintray
Groovy
Architecturequick overview to fill inThen get to business
Social software distribution platform
InterwebzClientClient
DNS
InteractionDownloadESDBDBCDN
Load Balancer
HTTPS
bintray.com
dl.bintray.com
The download server is the most importantdl.bintray.comJcenter.bintray.comJcenter on Grails or Gradle
Webapp&Micro
GrailsGroovySpockGebGParsBuild&Auto
GradleGroovyGVMCrash~ 90%
? , !
, ...
- SOAP -!
!
1 + 1 = 21 + 2 = 31 + 3 = ?
: Sitemap
crawler-?
Runs once a dayIterates over all major content
https://www.flickr.com/photos/calafellvalo/2859947965
Runs once a dayIterates over all major content
XML
50MB max
50K entries maxDon't we all love XML.Makes incremental changes a PITA
WebappJenkins
nginxSitemapDomainsMapper
Still smallNot much contentPragmaticRuns once a dayIterates over all major content.Query GORM and write the output
10sK
Users80K
Repositories230K
Packages+40
MB of XMLIncremental update is a PITA
, ,
WebappJenkins
nginxSitemapDomainsMapper
https://www.flickr.com/photos/bryanburke/2854366734
Not the actual JFrog offices
Jenkins
, ,
DB
XML
Garbage
MapperNot quick enough,I love GORM, but gorm is chubbyToo much too soon.The are solutions:Could provide more memory, stupid, heroin addictionFire up an interaction instance closed off to the world and run it there
,
https://www.flickr.com/photos/mit-libraries/3424098958
?
We cannot allow it to disturb either the webapp or the download server
As quick as possible, but throttle should it disturb any service
Can be interrupted so that It doesn't do a kamikaze
!
Site MapperJenkins
XML
!
Ratpack poweredsite mapper
HTTPCommands
CrashCommands
Dump index as JSON
Slurp with JSONSlurper
Write XML with XMLBuilder
GPars
!
Reads and writes are concurrentCan be stopped in the middleSpeed throttled
ElasticSearch !
https://github.com/elastic/elasticsearch-groovy
The syntax is closer to the source JSON
sitemap, !
:
RPM
RedhatYUM4 XML+GZRepositoryPOM
JavaMaven1 XMLModuleDEB
DebianAPT1 TEXT/GZStructuredExplain typed and indexed repositories in bintray
Majority of tools host an index consumed by the client.Repositories must update these indexes with every update to the repo
Maven also reflects on gradle it's the same indexing
- ,
RPM
createrepoPOM
AetherDEB
reprepoClient
Index
Binaries
IndexBinariesCreate a filesystem folder with all files, run the tool.Run every time you change the index.Requires native toolsOS dependent
: repomd
RPM
Maven
Debian
Bower
Docker
Gems
NPM
Pypi
Vagrant
API Java ( Groovy)
OS agnosticEmbeddableusable by any JVM language
Client
Publish
IndexBinaries
~600K
Files~3
Hours~400
MB of Index
WebapprepomdMaven is outstanding - Many package systems embed descriptors in the package EXPLAIN THISHave to fetch the descriptorsQuad core machineMultiple workers
Multiply by amount of customersConcurrent calculations
https://www.flickr.com/photos/thinredjellies/408275494
Client
Publish
Instance 1repomdLoadBalancer
Instance ...repomd
Instance NrepomdDB
How can you tell when a multi-module deployment completes? - EXPLAIN THISHow can you synchronize multiple embedded repomd processes?
Now we are stuck with
, !
https://www.flickr.com/photos/wackystuff/14931244568
Resque
Redis
Atomic, O(1)DistributedPersistenceQueriesReplicationQueue
Per type
Worker/
ProducerJesque
Java
Flavored
Resquehttps://github.com/resque/resquehttps://github.com/gresrun/jesque
Resque started in githubNeeded monitorable throw away tasksImportant but not lifesaving criticalQueue based workers Jesque java based
def job = new Job('WorkerClassName', ['arg1', 'arg2'])jesqueClient.enqueue('queueName', job)ClientRedis
Jesquedef factorySettings = [workerName: WorkerClass]def jobFactory = new MapBasedJobFactory(factorySettings)def worker = new WorkerImplFactory(jesqueConfig, ['queueName'], jobFactory).call()threadPool.submit(worker)
!
enqueue('workerType') { arg1 = 'value1' arg2 = false}BintrayRedis
Gresquesubmit(WorkerType) { conf1 = 'value1' conf2 = false}GparsHandle clustering betterNice DSL
...
BintrayEvents(QuietPeriod)
BintrayBintray
Redis
Ratpack, BTW
!
, , XML RPM
primary.xml
General archive infofilelists.xml
Lists files in RPMother.xml
Misc attributesrepomd.xml
Inventory of above
.
Truely sorry you had to witness
DOM/STAX LOL
XMLBuilder
private def primaryPackageBuilder = { del, packageMd -> del.'package'(type: 'rpm') { name(packageMd.name) arch(packageMd.architecture) version(epoch: packageMd.epoch, ver: packageMd.version, rel: packageMd.release) checksum(type: 'sha', pkgid: 'YES', packageMd.sha1Digest) summary(packageMd.summary) description(packageMd.description) packager(packageMd.packager) url(packageMd.url) time(file: packageMd.lastModified, build: packageMd.buildTime) size(package: packageMd.size, installed: packageMd.installedSize, archive: packageMd.archiveSize) location(href: packageMd.artifactRelativePath) format { 'rpm:license'(packageMd.license) 'rpm:vendor'(packageMd.vendor) 'rpm:group'(packageMd.group) 'rpm:buildhost'(packageMd.buildHost) 'rpm:sourcerpm'(packageMd.sourceRpm) 'rpm:header-range'(start: packageMd.headerStart, end: packageMd.headerEnd) ...
:
,
I promise you we don't lie
Don't ask me which country it is; I made it up and I was drunk
-
Per minute
DL Server
DL ServerDL Server
RedisPer dayPer country
Geo
IP
Mongo(UI)Downloadable
Log files
Not part of the webapp but a monolith
Stats data is fat and we can't afford to retain it.Rather suffer the overhead async
100500 . .
Kekekekekekekekekeke
Not the actual datacenter
http://www.mengsbizarreadventure.com/2010/starcraft-2-betaaka-zerg-rush-kekekeke-hd/
Incremental update is a PITA
Per minutePer dayPer country
,
Per minutePer dayPer country
Mongo(UI)Downloadable
Log filesProximity to datacenters
-
Per minutePer dayPer country
Geo
IP
Mongo(UI)Downloadable
Log files
!
http://galleryhip.com/black-and-white-fight-scene-kill-bill.html
?
We cannot allow it to disturb either the webapp or the download server
As quick as possible, but throttle should it disturb any service
Can be interrupted so that It doesn't do a kamikaze
3-
Gather
Format
ScatterWhat do I mean?Sounds like a motto; sounds like steve balmer; I assure you these aren't just buzzwordsGather download servers are spewing out info BAM!Format Regain data lost when reportedScatter Make sure that the services facing the user get the information they need. Quick!
Redis
DispatcherRedis
Protobuff(Gradle)
Redis'Atomic Ops
Link the different stages with dispatchers. They assure that no data is lost between phases
Dispatcher
Minute formatterDay formatterCountry formatterDispatcher
Mongo
Geo
IP
Whois
UI ScattererLog file ScattererDL Server
Dispatcher
Minute formatterDay formatterCountry formatterDispatcher
UI ScattererLog file ScattererDL Server
HTTPCommands
CrashCommands
Controller
DL Server
ControllerDispatcherDispatcherDispatcherDispatcherDispatcherDispatcher
Formatters
Scatterers
...
?? ? ?
!
:)
Especially if the infra, domain , team or concept is new,
It'll let you get the hang of things before you've fully committed and gone ahead
?