detection & reporting sensitive information

30
Sensitive Information Detection & Reporting Feasibility Presentation v2 ODU CS 410 Red Team, April 1 2020 Scan and crawl for leaks before your data gets breached!

Upload: others

Post on 03-Dec-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

Sensitive Information Detection & Reporting

Feasibility Presentation v2ODU CS 410 Red Team, April 1 2020

Scan and crawl for leaks before your data gets breached!

Outline

3. Team Biography 4. Problem Statement5. Problem Characteristics6. Who is affected?7. ODU as a Case Study8. Who handles the information?9. Threats in Recent Years10. Information/Organization Maturity11. Solution Statement12. Solution Characteristics13. SIDR Software Input/Output Diagram14. SIDR Process Flow15. A Day in the Life of a Customer Process Flow16. A Day in the Life with SIDR Process Flow17. Process Flow Comparison

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 2

18. Major Functional Component Diagram19. What the solution will not do20. Types of Sensitive Information that this Program will

Seek to Find21. Benefits to Customers22. Competition Matrix - Detection23. Competition Matrix - Reporting24. Risk Matrix25. Technical Risk Mitigation26. Customer Risk Mitigation27. Review - Key Points28. References29. References Continued30. Appendix

Team Biography

Andrew PatersonSoftware Developer

Cameron AllenProject Lead

Evan MulloyWeb & Software Developer

Dane BruceWeb & Software Developer

Michael HewittDocumentation Specialist

Kasey HowlettDatabase Engineer

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 3

Problem Statement

For enterprises of any size, the release of sensitive information can provide bad actors unauthorized access to private digital assets, potentially resulting in tarnished reputations and costing millions of dollars in damage.

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 4

Problem Characteristics

● Large breaches can develop from small mistakes [1] [2]

● Private or confidential information is sometimes placed in publicly accessible files or web pages [10]

● Public file/web page directories may be shared with individuals who are no longer active [15]

● Many organizations do not know every location that contains their confidential information [11]

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 5IBM Cost of Data Breach Report [8]

Who is affected?

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 6

Universities

Small to midsize companies

Research labs

Non-profit organizations

Private medical practices

***53% of mid-market companies suffered a breach in 2018[11]

Case Study

The ODU Computer Science department is a case study

● They run their own network

● They are different from the University’s overarching design

● They have a lot of student contributors to their forward-facing web content, and students make mistakes

• The Systems Group is run by students, and they have a lot of leeway

• The CS 411W pages are student-built

• The graduate project pages are built by students

● ODU must be FERPA-compliant like any other University• UINs are a unique form of sensitive information

• ODU could be sued if the privacy of UINs or grades was compromised

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 7

Who Handles The Information?

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 8

○ Registrar

○ Financial Aid

○ Human Resources

○ Finance Department

○ CISOs

○ Sales people

○ Researchers

○ Volunteers

○ Owners

○ Nurses

○ Receptionists

Threats In Recent Years

● Approximately 74% of breaches in 2018 are from credential abuse [6]

● 751,133,653 Google credentials leaked on paste sites and blackhat forums had a valid password match rate of 6.9% [7]

● In 2014, hackers used Amazon EC2 to scrape sensitive data from “hundreds of thousands of member profiles” on LinkedIn [9]

● Automated bots are used to scrape sensitive data [1] [10], such as email addresses to use for ‘spear phishing’ [10] and ‘whaling’ [16]

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 9

Information/Organization Maturity

The most important asset of any business is their proprietary data.

“More than 87 percent of organizations are classified as having low business intelligence and analytics maturity” - Gartner, December 2018 [13]

“Through 2019, 10% of organizations will have established operational information stewardship in line-of-business functions.” - Gartner Enterprise Information Management Maturity Model [14]

Gartner Information Management Building Blocks [14]Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 10

The SIDR Solution

By employing proprietary tools to scour web content

for sensitive data, enterprises will be kept notified of

potential data leaks and unauthorized access to

sensitive information.

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 11

Solution Characteristics

● Mitigates exposure to data leaks that were caused by human error

● Allows for the correction of small mistakes that could potentially become a large data breach

● Allows automated searching of specific local and public-facing locations that could potentially contain confidential information

● Finds private or confidential information that was mistakenly put on public files or web pages

● Allows searching of public file/web page directories that are no longer actively maintained

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 12

SIDR Software Input/Output Diagram

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 13

Web Pages

Files

Alerts and Reporting

Software Process Flow

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 14

Current Process Flow

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 15

Solution Process Flow

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 16

Process Flow Comparison

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 17

Major Functional Component Diagram

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 18

What the solution will not do

● Erase leaked information

● Change compromised login credentials

● Inform you about who knows your compromised login credentials

● Protect your system against malware

● Perform as an Intrusion Detection System (IDS)

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 19

Types of sensitive information that this program will seek to find

Business Information Personal Information

● Customer Information○ Addresses○ Email addresses○ Identifying information○ Phone numbers○ Contact information○ Payment information

● Company Information○ Company bank account information

● Employee Information○ SSN○ Banking○ Contact information○ Log-in credentials○ SSH keys

● Faculty and Student Information○ UINs○ Grades

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 20

Benefits to Customer

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 21

Ability to:

• Identify sensitive information that has compromised to the web• Scan local files to identify potentially sensitive information• Customize and prioritize definitions of sensitive information

Reduction of:

• Risk of data loss• Time elapsed between data leak and leak detection• Liability of legal violations

Identification of:

• Sources of repetitive leaks• Specific areas of file system • Individuals that have repeat offenses

Competition Matrix - Detection

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 22

Competition Matrix - Reporting

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 23

Risk Matrix

Certain

Likely T5

Possible T7 C5, C4, C3 T6 T1

Unlikely T2, T4 C2 C1

Rare T3

Negligible Marginal Moderate Major Critical

Technical Risks:T1 (“Incomplete search”): Sensitive Information isn’t found, and because of that, a data breach occursT2 (“Spider trap”): The Web Crawler enters a Spider trapT3 (“AI technology”): AI technology exceeds the capabilities of traditional software developmentT4 (“Insufficient power”): Computational power to process file/webpage analysis is not availableT5 (“Broken functionality”): Updates to Python or common operating systems breaks functionalityT6 (“Insufficient storage”): Storage needs during operation become greater than storage capacityT7 (“Blank search results””): No sensitive information is found for an extended period, possibly because it is not there

Customer Risks:C1 (“Hacker customer”): A hacker gains access to the database of found sensitive information that hasn’t been resolved yetC2 (“False positives”): Excessive false positives cause customer to lose interest in the productC3 (“Unsatisfactory GUI”): Customers aren’t satisfied with GUI interfaceC4 (“Customer inactivity”): Customers ignore reports of found informationC5 (“Customer tardiness”): Customers do not fix leakage in a timely manner, and their data is stolen

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 24

SEVERITY

PRO

BABI

LITY

Technical Risk MitigationsT1 (5,3) Incomplete search: Allow customers the ability to fine-tune what is “sensitive”

information to improve search algorithms

T2 (3,2) Spider trap: Maintain a list of web pages that have already been crawled

T3 (5,1) AI technology: Adjust the scope to focus on what is possible with traditional software

development

T4 (3,2) Insufficient power: Transfer responsibility to the customer

T5 (2,4) Broken functionality: Continue monitoring the risk and release frequent patches

T6 (3,3) Insufficient storage: Transfer responsibility to the customer

T7 (1,3) Blank search results: Increase sensitivity or accept that all sensitive information was

found and is not currently present

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 25

Customer Risk MitigationsC1 (5,2) Hacker customer: Strict authentication and authorization controls within the

SIDR solution and the associated databases - at least equivalent to customer’s

standards for protecting sensitive databases

C2 (4,2) False positives: Allow the user to specify a level of sensitivity for the lexical

analyzer

C3 (2,3) Unsatisfactory GUI: Use feedback from surveys and focus groups to improve

the GUI in future updates

C4 (2,3) Customer inactivity: Transfer responsibility to the customer

C5 (2,3) Customer tardiness: Transfer responsibility to the customer

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 26

SIDR: Your solution for finding sensitive data before the bad actors!

Benefits:

● By automating the search for data leaks, valuable time is saved

● By finding data leaks early, potential data breaches are stopped

● By preventing potential data breaches, your company saves

money and preserves their reputation

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 27

References1. Winder, D. (2019, August 20). Data Breaches Expose 4.1 Billion Records In First Six Months Of 2019. Retrieved from

Forbes website: https://www.forbes.com/sites/daveywinder/2019/08/20/data-breaches-expose-41-billion-records-in-first-six-months-of-2019/

2. Funke, D. (2019, September 23). Public data breaches have increased over the past decade [Graph]. Retrieved from PolitiFact website: https://www.politifact.com/article/2019/sep/23/numbers-how-common-are-data-breaches-and-what-can-/

3. Kirby, D. (2018, May 21). Five Types of Sensitive Data Almost All Companies Handle [Blog post]. Retrieved from https://kirbside.com/blog/five-types-of-sensitive-data-almost-all-companies-handle/

4. Steinberg, J. (2018, April 28). 12 Types Of Data That Businesses Need To Protect But Often Do Not. Retrieved from https://josephsteinberg.com/12-types-of-data-that-businesses-need-to-protect-but-often-do-not/

5. Adams, A., & Sasse, M. A. (1999). Users are not the enemy. Communications of the ACM, 42(12), 40-46. https://dl.acm.org/doi/10.1145/322796.322806

6. Columbus, L. (2019, February 26). 74% Of Data Breaches Start With Privileged Credential Abuse. Retrieved from Forbes website: https://www.forbes.com/sites/louiscolumbus/2019/02/26/74-of-data-breaches-start-with-privileged-credential-abuse

7. Thomas, K., Li, F., Zand, A., Barrett, J., Ranieri, J., Invernizzi, L., ... & Margolis, D. (2017, October). Data breaches, phishing, or malware? Understanding the risks of stolen credentials. In Proceedings of the 2017 ACM SIGSAC conference on computer and communications security (pp. 1421-1434).

8. IBM. (2019). 2019 Cost of a Data Breach Report. Retrieved from https://databreachcalculator.mybluemix.net/?cm_mc_uid=48790972741115820591575&cm_mc_sid_50200000=34702341582059157577&cm_mc_sid_52640000=98369671582059157583

9. Goodin, D. (2014, January 8). Hackers use Amazon cloud to scrape mass number of LinkedIn member profiles. Retrieved from Ars Technica website: https://arstechnica.com/information-technology/2014/01/hackers-use-amazon-cloud-to-scrape-mass-number-of-linkedin-member-profiles/

10. Johnson, A. (2020, February 3). Even Public, Visible Data on Your Website Can Benefit Hackers [Blog post]. Retrieved from TechyGeeksHome website: https://blog.techygeekshome.info/2020/02/even-public-visible-data-on-your-website-can-benefit-hackers/

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 28

References Continued11. Cisco. (2018, July). Small and Mighty: How Small and Midmarket Businesses Can Fortify Their Defenses Against Today's

Threats. Retrieved from https://www.cisco.com/c/dam/en/us/products/collateral/security/small-mighty-threat.pdf 12. SecurityTrails. (2018, November 27). Top 5 Ways to Handle a Data Breach. Retrieved from

https://securitytrails.com/blog/top-5-ways-handle-data-breach 13. Gartner. (2018, December 6). Gartner Data Shows 87 Percent of Organizations Have Low BI and Analytics Maturity.

Retrieved from gartner.com/en/newsroom/press-releases/2018-12-06-gartner-data-shows-87-percent-of-organizations-have-low-bi-and-analytics-maturity

14. Laney, D. (2018, October 22). Gartner's Enterprise Information Management Maturity Model. Retrieved from Gartner website: gartner.com/document/3236418

15. Verizon. (2019, May). Data Breach Investigation Report. Retrieved from enterprise.verizon.com/resources/reports/dbir16. What is whaling? - Definition from Techopedia. (n.d.). Retrieved from https://www.techopedia.com/definition/28643/whaling

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 29

AppendixUser Stories:As a Guest, I need to be able to use a GUI and not a CLI.As a Guest, I need to be able to scan internal files to find data leaks.As a Guest, I need to receive email notifications of data leaks or breaches.As a Guest, I need to receive text notifications of data leaks or breaches.As a Guest, I wish to be able to prioritize the types of sensitive information searched for by the system.As a Guest, I wish to be able to use regular expressions to customize searches.

As a Developer, I need to be able to use a GUI and not a CLI.As a Developer, I need to be able to crawl my website to find data leaks.As a Developer, I need to be able to crawl social media to find data leaks.As a Developer, I need to be able to crawl pastebins to find data leaks.As a Developer, I need to be able to crawl my GitHub repository to find data leaks.As a Developer, I need to be able to scan internal files to find data leaks.As a Developer, I need to receive email notifications of data leaks or breaches.As a Developer, I need to receive text notifications of data leaks or breaches.As a Developer, I need to be able to create a profile.As a Developer, I need to be able to add or remove my website to/from my profile.As a Developer, I wish to be able to scan files being transmitted.As a Developer, I wish to be able to prioritize the types of sensitive information searched for by the system.As a Developer, I wish to be able to provide my public website directory as input for the web crawler.As a Developer, I wish to be able to use regular expressions to customize searches.As a Developer, I wish to be able to add or remove my pastebin to/from my profile.As a Developer, I wish to be able to add or remove my GitHub repository to/from my profile.

Feasibility Presentation v2 | Sensitive Information Detection and Reporting | CS 410 | April 1, 2020 30

As a Website Owner, I need to be able to use a GUI and not a CLI.As a Website Owner, I need to receive email notifications of data leaks or breaches.As a Website Owner, I need to receive text notifications of data leaks or breaches.As a Website Owner, I need to be able to create a profile.As a Website Owner, I need to be able to add or remove my Developers to/from my profile.As a Website Owner, I need to be able to add or remove my website to/from my profile.As a Website Owner, I wish to be able to add or remove my pastebin to/from my profile.As a Website Owner, I wish to be able to add or remove my GitHub repository to/from my profile.

As an Administrator, I need to be able to add a new user account.As an Administrator, I need to be able to delete a user account.As an Administrator, I need to be able to add or remove Developers to/from a Website Owner account.As an Administrator, I need to be able to add or remove a website to/from a user profile.As an Administrator, I need to be able to add or remove a pastebin to/from a user profile.As an Administrator, I need to be able to add or remove a GitHub repository to/from a user profile.As an Administrator, I wish to be able to determine whether a user is a hacker.