ln monitoring repositories
TRANSCRIPT
![Page 1: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/1.jpg)
Monitoring repositories for FUN and PROFIT
@snyff [_]
![Page 2: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/2.jpg)
About me
● Security consultant (C.T.O.) working for Securus Global in Melbourne
● PentesterLab (.com): ○ cool/awesome (web) *free* training/exercises ○ real life scenario
![Page 3: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/3.jpg)
Disclaimer
● No code is going to be released today
● No repositories were harmed duringthe preparation of this talk
● I worked on Web and Open Source projects● I worked on commits without using the entire
project's source code
![Page 4: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/4.jpg)
Why work on commits?
● Corporate development:○ Cannot review all projects anymore○ Nice to have a “what to check today” ○ Sort commits by criticality○ Detect backdoors
● Agile development:○ The code changes every day○ Can’t rely on one time code review anymore○ Current approach: daily scan
![Page 5: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/5.jpg)
Why work on commits?
● You have vulnerabilities:○ Detect patches affecting your bugs○ Detect changes to sensitive functions
![Page 6: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/6.jpg)
Why work on commits?
● You want vulnerabilities ($$):○ Detect new features with dangerous functions○ Detect changes to sensitive functions
![Page 7: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/7.jpg)
Why work on commits?
● You want bugs (lulz):○ Get bugs few hours before the patch is available○ Get a list of bad practices examples○ Detect silent patching
![Page 8: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/8.jpg)
What's a repository?
● Developers
● Files
● Commits
● And all of these are constantly moving...
![Page 9: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/9.jpg)
Developers
● Main developer(s):○ Add features○ Fix bugs
● Cosmetic committer(s):○ Change comments (fix typo)○ Change designs of the website○ Change indentation○ Add documentation
● External people○ Do a bit of everything
![Page 10: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/10.jpg)
Files
● README/LICENSE files
● Templates, HTML, CSS
● Images
● Code:○ Libraries○ Installation code○ "normal" code
![Page 11: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/11.jpg)
Commits
● Developer's name
● Code changes:○ Changes: diff○ Files changed○ Number of deletion/addition
● Date/Time of the commit
● Message
![Page 12: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/12.jpg)
![Page 13: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/13.jpg)
Examples of projects monitored
![Page 14: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/14.jpg)
Stats (on the last 5000 commits)
● Commits per week:○ anywhere between 20 and 180 (phpmyadmin) per
week○ 40 commits per week seems to be the average for
"normal/interesting" projects
● Authors:○ between 1 and 140
● Average commit: 200 lines (insertions+deletions)
![Page 15: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/15.jpg)
Goals...
![Page 16: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/16.jpg)
Goals: counterexample
![Page 17: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/17.jpg)
Goals: example
![Page 18: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/18.jpg)
Goals: example
![Page 19: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/19.jpg)
Goals: example
![Page 20: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/20.jpg)
Filtering...
![Page 21: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/21.jpg)
Filtering files
● General approach:○ images○ css○ README
● Framework based:○ tests (interesting to keep for some projects)○ database migration/creation script
● Project based files○ deployment○ installation files
![Page 22: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/22.jpg)
Filtering developers
● For a given project find the "cosmetic developers"
● Don't get me wrong they are not useless, they just do things i don't care about
![Page 23: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/23.jpg)
Results
● Around 5-10% of commits have nothing to do with code...
● You can divide the size of most other commits by 2-3 if you ignore noise (files/comments/...):○ new code with test cases○ modification in comments○ ...
![Page 24: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/24.jpg)
Classification
![Page 25: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/25.jpg)
Data mining
● Take your samples (commits)○ Extract a vector from each sample○ Classify each sample
● From a training set, learn to classify the data
● Apply what you learned:○ to the same training set after splitting it (cross-
validation) ○ to new samples
![Page 26: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/26.jpg)
Data mining
● training set: [1,2,3,0,10,220 ] -> bugfix [2,4,3,0,1,0 ] -> boring [2,5,3,3,1,1 ] -> boring [20,1,0,100,0,10 ] -> new bug
● testing:[23,0,1,90,0,15 ] -> ???
![Page 27: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/27.jpg)
Extracting a vector
● You can't really say a commit is close to another commit
● You need to generate a vector from each commit to compare them
● Once you have done that, everything else is just magic^W Maths
![Page 28: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/28.jpg)
Extracting a vector: getting data
● Number of lines changed:○ insertion vs deletion
● Number of words changed (--word-diff):○ insertion vs deletion
● Authors:○ rating of authors based on the project's history
■ "fixing" score■ "vulnerability creator" score
○ new developers○ known security researchers
![Page 29: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/29.jpg)
Extracting a vector: getting data
● Number of "dangerous" functions:○ insertion○ deletion
● Number of "filtering" functions:○ insertion○ deletion
● commit date vs author date
● Keywords in the message and in the code
![Page 30: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/30.jpg)
Extracting a vector: getting data
● Files modified:○ already implicated in a bug fix○ already implicated in a vulnerability
![Page 31: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/31.jpg)
Filtering vs Dangerous
● Good list of "dangerous" signatures from graudit:○ https://github.com/wireghoul/graudit/
● Weighting is *really* important:○ echo -> potential XSS -> 1 point○ system -> potential commands execution -> 10
points
● Some functions are in both:○ crypto functions for example○ crypto can be dangerous and but can filter as well
![Page 32: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/32.jpg)
Filtering vs Dangerous
system htmlentities
exec
echo
create_function
preg_replacepreg_replace
intvalbasename
File.basename
open3 popenescape
echo attr_accessible
attr_protectedattr_protected
assert
eval
send
![Page 33: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/33.jpg)
Keywords
SQL injection
Command execTypo
Cross Site Scripting
CVE
Dangerous
CSS rules
CSS selector
Directory traversal
Code executionvulnerability
XSS
Version number
Changelogdescription
Documentation
punctuation
Security
RiskyCSRF
disclosure
![Page 34: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/34.jpg)
Classification
● Fixed bugs:○ learn from dangerous keywords
● New bugs:○ git blame○ read the source code and classify manually
● Potentially interesting new feature:○ read the source code○ can be a new bug
![Page 35: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/35.jpg)
Results
● Vector computation:○ between 15 and 120 minutes for 5000 commits
● Classification:○ less than a minute
● Scoring:○ 90% success rate on bug fix (without using the
message as part of the vector)○ 50/50 between FP and FN on bug fix○ 200 commits down to 5-10 bugs per day
![Page 36: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/36.jpg)
My tool: SANZARU
● Japanese names for tools make you a Ninja ;)
● Ruby based (what else...)
● Data Mining done with Weka (thx Silvio)
![Page 37: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/37.jpg)
SANZARU: virtuous circle
● Made in a way that the more you learn on a project the more effective it gets :)
● Score authors through learning
● Score files through learning
● add functions used by the project
![Page 38: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/38.jpg)
SANZARU: "learning mode"
● take the last 5k commits and give you the list of impacted files and authors with a weight
● still working on finding the initial bug's author but it doesn't really give you more information
![Page 39: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/39.jpg)
SANZARU: configuration fileconfigure({ :path => "/home/snyff/code/rails", :type => :git, :remote => "origin/master", :origin => "https://github.com/rails/rails", :languages => [ :ruby ] })
filter({ :extensions => [ :html, :css, :jpg, :png, :md, :tpl ], :files => ["LICENSE", "*test*"] })
alert({ :keywords => [keywords_default]... })
![Page 40: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/40.jpg)
SANZARU: configuration fileclassify(:authors => { :default => 0,"[email protected]"=>19,"[email protected]"=>15,"[email protected]"=>8, "[email protected]"=>11,"[email protected]"=>25, .... },
:files => { :default => 0,"activemodel/lib/active_model/mass_assignment_security.rb"=>20, "railties/lib/rails/application.rb"=>17, "actionpack/lib/action_view/helpers/form_helper.rb"=>17, "activerecord/lib/active_record/core.rb"=>17, ...})
![Page 41: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/41.jpg)
SANZARU: "classification mode"
● Using ruby to create all the vectors
● Using weka to classify the data
● Then manual review of the results:○ New features to find security bugs ○ FP for possible silent patching
![Page 42: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/42.jpg)
SANZARU: "daily mode"
● Cron job (every day)○ update all repositories (hasn't been blacklisted by
github...yet), ruby-git is *shit*○ find alerts in new commits○ classify new commits○ give me a nice report with what to read
![Page 43: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/43.jpg)
SANZARU: example of output
![Page 44: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/44.jpg)
Example found this week (not exploitable... yet):
esc_js escapes ' and "... this doesn't
![Page 45: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/45.jpg)
Example found this week:
![Page 46: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/46.jpg)
Example found this morning:
![Page 47: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/47.jpg)
General observations
● Most fixes are:○ small code insertion (less than 10 lines)○ basic line substitution○ easy to detect
● Most new bugs are:○ details...○ really hard to detect statistically○ general approach: read all potentially interesting
commits○ working on important projects make the creation of
bugs far less likely○ it's not going to rain 0dayz...
![Page 48: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/48.jpg)
Possible improvements
● Integrating syntactic analysis:○ regular expression are just not enough○ False alerts are time consuming...
● Retrieve information from external sources:○ bug report○ CVE
● Support for more languages/platforms:○ Objective C libraries and applications?○ Linux kernel?○ ...
![Page 49: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/49.jpg)
Conclusion
● Easy to detect:○ (Silent) Security Fixes○ New features with "interesting" functions
● Not so easy to detect○ New security bugs
● Still worth the time○ if you want bugs○ if you are doing code review to have examples to
learn from or share: vulnerability patterns○ most frustrating thing you can do?
![Page 50: Ln monitoring repositories](https://reader030.vdocument.in/reader030/viewer/2022032419/55a273701a28abf46b8b4693/html5/thumbnails/50.jpg)
Questions?@snyff
● Have a great Ruxcon● Play the CTF and Lock Picking● Remember to checkout:
○ PentesterLab.com○ @PentesterLab
● Thx to everyone who helped me putting this talk together