detection & reporting sensitive information
TRANSCRIPT
Sensitive Information Detection & Reporting
Feasibility Presentation v2ODU CS 410 Red Team, April 1 2020
Scan and crawl for leaks before your data gets breached!
Outline
3. Team Biography 4. Problem Statement5. Problem Characteristics6. Who is affected?7. ODU as a Case Study8. Who handles the information?9. Threats in Recent Years10. Information/Organization Maturity11. Solution Statement12. Solution Characteristics13. SIDR Software Input/Output Diagram14. SIDR Process Flow15. A Day in the Life of a Customer Process Flow16. A Day in the Life with SIDR Process Flow17. Process Flow Comparison
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 2
18. Major Functional Component Diagram19. What the solution will not do20. Types of Sensitive Information that this Program will
Seek to Find21. Benefits to Customers22. Competition Matrix - Detection23. Competition Matrix - Reporting24. Risk Matrix25. Technical Risk Mitigation26. Customer Risk Mitigation27. Review - Key Points28. References29. References Continued30. Appendix
Team Biography
Andrew PatersonSoftware Developer
Cameron AllenProject Lead
Evan MulloyWeb & Software Developer
Dane BruceWeb & Software Developer
Michael HewittDocumentation Specialist
Kasey HowlettDatabase Engineer
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 3
Problem Statement
For enterprises of any size, the release of sensitive information can provide bad actors unauthorized access to private digital assets, potentially resulting in tarnished reputations and costing millions of dollars in damage.
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 4
Problem Characteristics
● Large breaches can develop from small mistakes [1] [2]
● Private or confidential information is sometimes placed in publicly accessible files or web pages [10]
● Public file/web page directories may be shared with individuals who are no longer active [15]
● Many organizations do not know every location that contains their confidential information [11]
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 5IBM Cost of Data Breach Report [8]
Who is affected?
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 6
Universities
Small to midsize companies
Research labs
Non-profit organizations
Private medical practices
***53% of mid-market companies suffered a breach in 2018[11]
Case Study
The ODU Computer Science department is a case study
● They run their own network
● They are different from the University’s overarching design
● They have a lot of student contributors to their forward-facing web content, and students make mistakes
• The Systems Group is run by students, and they have a lot of leeway
• The CS 411W pages are student-built
• The graduate project pages are built by students
● ODU must be FERPA-compliant like any other University• UINs are a unique form of sensitive information
• ODU could be sued if the privacy of UINs or grades was compromised
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 7
Who Handles The Information?
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 8
○ Registrar
○ Financial Aid
○ Human Resources
○ Finance Department
○ CISOs
○ Sales people
○ Researchers
○ Volunteers
○ Owners
○ Nurses
○ Receptionists
Threats In Recent Years
● Approximately 74% of breaches in 2018 are from credential abuse [6]
● 751,133,653 Google credentials leaked on paste sites and blackhat forums had a valid password match rate of 6.9% [7]
● In 2014, hackers used Amazon EC2 to scrape sensitive data from “hundreds of thousands of member profiles” on LinkedIn [9]
● Automated bots are used to scrape sensitive data [1] [10], such as email addresses to use for ‘spear phishing’ [10] and ‘whaling’ [16]
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 9
Information/Organization Maturity
The most important asset of any business is their proprietary data.
“More than 87 percent of organizations are classified as having low business intelligence and analytics maturity” - Gartner, December 2018 [13]
“Through 2019, 10% of organizations will have established operational information stewardship in line-of-business functions.” - Gartner Enterprise Information Management Maturity Model [14]
Gartner Information Management Building Blocks [14]Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 10
The SIDR Solution
By employing proprietary tools to scour web content
for sensitive data, enterprises will be kept notified of
potential data leaks and unauthorized access to
sensitive information.
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 11
Solution Characteristics
● Mitigates exposure to data leaks that were caused by human error
● Allows for the correction of small mistakes that could potentially become a large data breach
● Allows automated searching of specific local and public-facing locations that could potentially contain confidential information
● Finds private or confidential information that was mistakenly put on public files or web pages
● Allows searching of public file/web page directories that are no longer actively maintained
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 12
SIDR Software Input/Output Diagram
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 13
Web Pages
Files
Alerts and Reporting
Software Process Flow
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 14
Current Process Flow
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 15
Solution Process Flow
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 16
Process Flow Comparison
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 17
Major Functional Component Diagram
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 18
What the solution will not do
● Erase leaked information
● Change compromised login credentials
● Inform you about who knows your compromised login credentials
● Protect your system against malware
● Perform as an Intrusion Detection System (IDS)
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 19
Types of sensitive information that this program will seek to find
Business Information Personal Information
● Customer Information○ Addresses○ Email addresses○ Identifying information○ Phone numbers○ Contact information○ Payment information
● Company Information○ Company bank account information
● Employee Information○ SSN○ Banking○ Contact information○ Log-in credentials○ SSH keys
● Faculty and Student Information○ UINs○ Grades
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 20
Benefits to Customer
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 21
Ability to:
• Identify sensitive information that has compromised to the web• Scan local files to identify potentially sensitive information• Customize and prioritize definitions of sensitive information
Reduction of:
• Risk of data loss• Time elapsed between data leak and leak detection• Liability of legal violations
Identification of:
• Sources of repetitive leaks• Specific areas of file system • Individuals that have repeat offenses
Competition Matrix - Detection
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 22
Competition Matrix - Reporting
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 23
Risk Matrix
Certain
Likely T5
Possible T7 C5, C4, C3 T6 T1
Unlikely T2, T4 C2 C1
Rare T3
Negligible Marginal Moderate Major Critical
Technical Risks:T1 (“Incomplete search”): Sensitive Information isn’t found, and because of that, a data breach occursT2 (“Spider trap”): The Web Crawler enters a Spider trapT3 (“AI technology”): AI technology exceeds the capabilities of traditional software developmentT4 (“Insufficient power”): Computational power to process file/webpage analysis is not availableT5 (“Broken functionality”): Updates to Python or common operating systems breaks functionalityT6 (“Insufficient storage”): Storage needs during operation become greater than storage capacityT7 (“Blank search results””): No sensitive information is found for an extended period, possibly because it is not there
Customer Risks:C1 (“Hacker customer”): A hacker gains access to the database of found sensitive information that hasn’t been resolved yetC2 (“False positives”): Excessive false positives cause customer to lose interest in the productC3 (“Unsatisfactory GUI”): Customers aren’t satisfied with GUI interfaceC4 (“Customer inactivity”): Customers ignore reports of found informationC5 (“Customer tardiness”): Customers do not fix leakage in a timely manner, and their data is stolen
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 24
SEVERITY
PRO
BABI
LITY
Technical Risk MitigationsT1 (5,3) Incomplete search: Allow customers the ability to fine-tune what is “sensitive”
information to improve search algorithms
T2 (3,2) Spider trap: Maintain a list of web pages that have already been crawled
T3 (5,1) AI technology: Adjust the scope to focus on what is possible with traditional software
development
T4 (3,2) Insufficient power: Transfer responsibility to the customer
T5 (2,4) Broken functionality: Continue monitoring the risk and release frequent patches
T6 (3,3) Insufficient storage: Transfer responsibility to the customer
T7 (1,3) Blank search results: Increase sensitivity or accept that all sensitive information was
found and is not currently present
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 25
Customer Risk MitigationsC1 (5,2) Hacker customer: Strict authentication and authorization controls within the
SIDR solution and the associated databases - at least equivalent to customer’s
standards for protecting sensitive databases
C2 (4,2) False positives: Allow the user to specify a level of sensitivity for the lexical
analyzer
C3 (2,3) Unsatisfactory GUI: Use feedback from surveys and focus groups to improve
the GUI in future updates
C4 (2,3) Customer inactivity: Transfer responsibility to the customer
C5 (2,3) Customer tardiness: Transfer responsibility to the customer
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 26
SIDR: Your solution for finding sensitive data before the bad actors!
Benefits:
● By automating the search for data leaks, valuable time is saved
● By finding data leaks early, potential data breaches are stopped
● By preventing potential data breaches, your company saves
money and preserves their reputation
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 27
References1. Winder, D. (2019, August 20). Data Breaches Expose 4.1 Billion Records In First Six Months Of 2019. Retrieved from
Forbes website: https://www.forbes.com/sites/daveywinder/2019/08/20/data-breaches-expose-41-billion-records-in-first-six-months-of-2019/
2. Funke, D. (2019, September 23). Public data breaches have increased over the past decade [Graph]. Retrieved from PolitiFact website: https://www.politifact.com/article/2019/sep/23/numbers-how-common-are-data-breaches-and-what-can-/
3. Kirby, D. (2018, May 21). Five Types of Sensitive Data Almost All Companies Handle [Blog post]. Retrieved from https://kirbside.com/blog/five-types-of-sensitive-data-almost-all-companies-handle/
4. Steinberg, J. (2018, April 28). 12 Types Of Data That Businesses Need To Protect But Often Do Not. Retrieved from https://josephsteinberg.com/12-types-of-data-that-businesses-need-to-protect-but-often-do-not/
5. Adams, A., & Sasse, M. A. (1999). Users are not the enemy. Communications of the ACM, 42(12), 40-46. https://dl.acm.org/doi/10.1145/322796.322806
6. Columbus, L. (2019, February 26). 74% Of Data Breaches Start With Privileged Credential Abuse. Retrieved from Forbes website: https://www.forbes.com/sites/louiscolumbus/2019/02/26/74-of-data-breaches-start-with-privileged-credential-abuse
7. Thomas, K., Li, F., Zand, A., Barrett, J., Ranieri, J., Invernizzi, L., ... & Margolis, D. (2017, October). Data breaches, phishing, or malware? Understanding the risks of stolen credentials. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security (pp. 1421-1434).
8. IBM. (2019). 2019 Cost of a Data Breach Report. Retrieved from https://databreachcalculator.mybluemix.net/?cm_mc_uid=48790972741115820591575&cm_mc_sid_50200000=34702341582059157577&cm_mc_sid_52640000=98369671582059157583
9. Goodin, D. (2014, January 8). Hackers use Amazon cloud to scrape mass number of LinkedIn member profiles. Retrieved from Ars Technica website: https://arstechnica.com/information-technology/2014/01/hackers-use-amazon-cloud-to-scrape-mass-number-of-linkedin-member-profiles/
10. Johnson, A. (2020, February 3). Even Public, Visible Data on Your Website Can Benefit Hackers [Blog post]. Retrieved from TechyGeeksHome website: https://blog.techygeekshome.info/2020/02/even-public-visible-data-on-your-website-can-benefit-hackers/
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 28
References Continued11. Cisco. (2018, July). Small and Mighty: How Small and Midmarket Businesses Can Fortify Their Defenses Against Today's
Threats. Retrieved from https://www.cisco.com/c/dam/en/us/products/collateral/security/small-mighty-threat.pdf 12. SecurityTrails. (2018, November 27). Top 5 Ways to Handle a Data Breach. Retrieved from
https://securitytrails.com/blog/top-5-ways-handle-data-breach 13. Gartner. (2018, December 6). Gartner Data Shows 87 Percent of Organizations Have Low BI and Analytics Maturity.
Retrieved from gartner.com/en/newsroom/press-releases/2018-12-06-gartner-data-shows-87-percent-of-organizations-have-low-bi-and-analytics-maturity
14. Laney, D. (2018, October 22). Gartner's Enterprise Information Management Maturity Model. Retrieved from Gartner website: gartner.com/document/3236418
15. Verizon. (2019, May). Data Breach Investigation Report. Retrieved from enterprise.verizon.com/resources/reports/dbir16. What is whaling? - Definition from Techopedia. (n.d.). Retrieved from https://www.techopedia.com/definition/28643/whaling
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 29
AppendixUser Stories:As a Guest, I need to be able to use a GUI and not a CLI.As a Guest, I need to be able to scan internal files to find data leaks.As a Guest, I need to receive email notifications of data leaks or breaches.As a Guest, I need to receive text notifications of data leaks or breaches.As a Guest, I wish to be able to prioritize the types of sensitive information searched for by the system.As a Guest, I wish to be able to use regular expressions to customize searches.
As a Developer, I need to be able to use a GUI and not a CLI.As a Developer, I need to be able to crawl my website to find data leaks.As a Developer, I need to be able to crawl social media to find data leaks.As a Developer, I need to be able to crawl pastebins to find data leaks.As a Developer, I need to be able to crawl my GitHub repository to find data leaks.As a Developer, I need to be able to scan internal files to find data leaks.As a Developer, I need to receive email notifications of data leaks or breaches.As a Developer, I need to receive text notifications of data leaks or breaches.As a Developer, I need to be able to create a profile.As a Developer, I need to be able to add or remove my website to/from my profile.As a Developer, I wish to be able to scan files being transmitted.As a Developer, I wish to be able to prioritize the types of sensitive information searched for by the system.As a Developer, I wish to be able to provide my public website directory as input for the web crawler.As a Developer, I wish to be able to use regular expressions to customize searches.As a Developer, I wish to be able to add or remove my pastebin to/from my profile.As a Developer, I wish to be able to add or remove my GitHub repository to/from my profile.
Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 30
As a Website Owner, I need to be able to use a GUI and not a CLI.As a Website Owner, I need to receive email notifications of data leaks or breaches.As a Website Owner, I need to receive text notifications of data leaks or breaches.As a Website Owner, I need to be able to create a profile.As a Website Owner, I need to be able to add or remove my Developers to/from my profile.As a Website Owner, I need to be able to add or remove my website to/from my profile.As a Website Owner, I wish to be able to add or remove my pastebin to/from my profile.As a Website Owner, I wish to be able to add or remove my GitHub repository to/from my profile.
As an Administrator, I need to be able to add a new user account.As an Administrator, I need to be able to delete a user account.As an Administrator, I need to be able to add or remove Developers to/from a Website Owner account.As an Administrator, I need to be able to add or remove a website to/from a user profile.As an Administrator, I need to be able to add or remove a pastebin to/from a user profile.As an Administrator, I need to be able to add or remove a GitHub repository to/from a user profile.As an Administrator, I wish to be able to determine whether a user is a hacker.