mobilogleak: a study on data leakage caused by poor

99
i MobiLogLeak: A Study on Data Leakage Caused by Poor Logging Practices Rui Zhou A Thesis in the Department of Electrical and Computer Engineering Presented in Partial Fulfillment of the Requirements for the Degree of Master of Applied Science (Electrical and Computer Engineering) at Concordia University Montreal, Quebec, Canada July 2020 © Rui Zhou, 2020

Upload: others

Post on 31-Mar-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

i

MobiLogLeak: A Study on Data Leakage

Caused by Poor Logging Practices

Rui Zhou

A Thesis

in the Department of

Electrical and Computer Engineering

Presented in Partial Fulfillment of the Requirements

for the Degree of Master of Applied Science (Electrical and Computer Engineering)

at

Concordia University

Montreal, Quebec, Canada

July 2020

© Rui Zhou, 2020

ii

CONCORDIA UNIVERSITY

School of Graduate Studies

This is to certify that the thesis prepared

By: Rui Zhou

Entitled: MobiLogLeak: A Study on Data Leakage Caused by Poor Logging Practices

and submitted in partial fulfillment of the requirements for the degree of

Master of Applied Science (Electrical and Computer Engineering)

complies with the regulations of the University and meets the accepted standards with respect to

originality and quality.

Signed by the final examining committee:

_______________________________________________ Chair

Dr. Yan Liu

_______________________________________________ Examiner

Dr. Amr Yousef

_______________________________________________ Examiner

Dr. Yan Liu

_______________________________________________ Supervisor

Dr. Abdelwahab Hamou-Lhadj

Approved by: ________________________________________________

Dr. Michelle Nokken, Graduate Program Director

________________, 2020 __________________________________

Dr. Mourad Debbabi, Interim Dean,,

Gina Cody School of Engineering &

Computer Science

iii

Abstract

MobiLogLeak: A Study on Data Leakage

Caused by Poor Logging Practices

Rui Zhou

Logging is an essential software practice that is used by developers to debug, diagnose and audit

software systems. Despite the advantages of logging, poor logging practices can potentially leak

sensitive data. The problem of data leakage is more severe in applications that run on mobile

devices, since these devices carry sensitive identification information ranging from physical

device identifiers (e.g., IMEI MAC address) to communications network identifiers (e.g., SIM, IP,

Bluetooth ID), and application-specific identifiers related to the location and accounts of users.

This study explores the impact of logging practices on data leakage of such sensitive information.

Particularly, we want to investigate whether logs inserted into an application code could lead to

data leakage. While studying logging practices in mobile applications is an active research area,

to our knowledge, this is the first study that explores the interplay between logging and security in

the context of mobile applications for Android. We propose an approach called MobiLogLeak that

identifies log statements in deployed apps that leak sensitive data. MobiLogLeak relies on taint

flow analysis. Among 5,000 Android apps that we studied, we found that 200 apps leak sensitive

data through logging.

iv

Acknowledgments

During the period of my master, I had the opportunity to learn from many excellent experts. They

gave me a lot of advice and help with my thesis.

From the bottom of my heart, I would like to thank my supervisor, Dr. Abdelwahab Hamou-Lhadj,

for the support and advice he gave me throughout this whole research. All the time, he shows

confidence in me, listens to my thoughts, keeps guiding me with weekly meetings, helps and

corrects me no matter when I meet him. I am very appreciated for his advising and extensively

reviewing my documents. He has influenced me a lot in the field of research.

Many thanks to Dr. Haipeng Cai from the Department of Electrical Engineering and Computer

Science at Washington State University for his assistants and instructions in the process of

experiment. He also provides me with clear ideas on my topic.

Then, I would like to thank Kobra Khanmohammadi, the PhD student in our lab, for sharing her

knowledge and tool to access the data-set in my experiment. In addition, I would like to thank

Steven Arzt, a researcher at the Fraunhofer Institute for Secure Information Technology (SIT) in

Darmstadt for helping me to extend the functions of the detection tool.

Also, I would like to thank all the lab mates for their help. They gave me lots of suggestions about

researching and provides many useful tips to help me to complete the tasks.

Finally, I would like to thank my family. Although they lived far away from me, they gave me

many supports and confidence during my master studies. With their encouragement, I become

more and more stronger to face the difficulties and challenges. Thank you all!

v

List of Contents

List of Figures............................................................................................................................. viii

List of Listings .............................................................................................................................. ix

List of Tables ................................................................................................................................. x

Chapter 1 - Introduction .............................................................................................................. 1

Objective ............................................................................................................................... 1

Thesis Outline ....................................................................................................................... 2

Related publications .............................................................................................................. 3

Chapter 2 - Background ............................................................................................................... 4

Android Architecture ............................................................................................................ 4

Logging in Android Development ........................................................................................ 6

Literature Review of Existing Log Analysis Studies............................................................ 9

2.3.1 Logging practices ........................................................................................................... 9

2.3.2 Quality of Logs ............................................................................................................ 11

2.3.3 Android Vulnerability Analysis ................................................................................... 13

Summary ............................................................................................................................. 16

Chapter 3 - Mobilelike Approach.............................................................................................. 17

Overview ............................................................................................................................. 17

vi

Converting App APK to Jimple code ................................................................................. 18

Taint Flow Analysis ............................................................................................................ 20

Generating the Source-Sink Log-Related Paths ................................................................. 21

Context Analysis of Log-Related Paths .............................................................................. 23

Types of Log-Related Data Leakage .................................................................................. 29

Summary ............................................................................................................................. 31

Chapter 4 - Evaluation ............................................................................................................... 32

Dataset Description ............................................................................................................. 32

Context Analysis Results .................................................................................................... 35

Taint Flow Analysis Results ............................................................................................... 37

Threats to validity ............................................................................................................... 48

Limitations .......................................................................................................................... 48

Chapter 5 - Conclusion and Future Work ............................................................................... 50

Research Contributions ....................................................................................................... 50

Opportunities for Further Research .................................................................................... 50

Appendix A. Results of the experiments ................................................................................ 52

Table A1. 200 Taint Apps Dataset ....................................................................................... 52

Appendix B. Code Snippet for Cases ..................................................................................... 60

Listing B1. Network ................................................................................................................. 60

Listing B2. Account .................................................................................................................. 63

vii

Listing B3. Location ................................................................................................................. 67

Listing B4. Database ................................................................................................................. 77

Bibliography ................................................................................................................................ 82

viii

List of Figures

Figure 2.1 Android Apps Development Process ............................................................................ 4

Figure 2.2 Android Components .................................................................................................... 5

Figure 3.1 Overview of Our Approach ......................................................................................... 18

Figure 3.2 Source code and its corresponding Jimple code .......................................................... 19

Figure 3.3 Taint flow analysis example ........................................................................................ 20

Figure 3.4 Example of a taint flow graph ..................................................................................... 22

Figure 3.5 Log Filter in Previous Research .................................................................................. 22

Figure 3.6 Overview of Context Process ...................................................................................... 25

Figure 3.7 App Detail in Google Play .......................................................................................... 27

Figure 3.8 Context Statistics Level ............................................................................................... 28

Figure 3.9 Source Categorization ................................................................................................. 29

Figure 4.1 Distribution of apps based in the dataset ..................................................................... 32

Figure 4.2 Distribution of ALRS based on the number of installations ..................................... 34

Figure 4.3 Android API – Network: getDeviceId......................................................................... 38

Figure 4.4 Android API - Account ............................................................................................... 40

Figure 4.5 Android API – Loaction getLatitude ........................................................................... 42

Figure 4.6 Android API – Location getLongitude ........................................................................ 43

Figure 4.7 ALRS Apps Source Categories ................................................................................... 45

Figure 4.8 Android API - Database getString............................................................................... 46

Figure 4.9 Android API - Database getColumnIndex .................................................................. 47

ix

List of Listings

Listing 1. Log Message Example ................................................................................................... 6

Listing 2: Performance Example .................................................................................................... 9

Listing 3: Log Snippet Example ................................................................................................... 24

Listing 4: Get Component Type ................................................................................................... 26

Listing 5: CatSource: Examples of Different Categories ............................................................. 31

Listing 6: Code Snippet for Case Network Device ID ................................................................. 38

Listing 7: Code Snippet for Case Account Name ......................................................................... 39

Listing 8: Code Snippet for Case Location latitude and longitude ............................................... 42

Listing 9: Code Snippet for Case Database Password .................................................................. 44

x

List of Tables

Table 4-1 Categorization of apps in the dataset with log-related sinks ........................................ 33

Table 4-2 Log context analysis ..................................................................................................... 35

Table 4-3 Components for each app in different types ................................................................. 36

1

Chapter 1 - Introduction

Objective

Software logging is an important software development practice used by developers to gain insight

into the behavior of software systems are run-time [Miranskky 16, Zhu 15]. Log messages printed

during the execution of a program are often the only data source available for developers to

diagnose program failures [Khatuya 18], detect vulnerabilities and malware infections [Yen 13],

detection of system anomalies [Islam 18], etc. Developers of mobile applications (apps), the focus

of this thesis, also use logging to keep track of important events that can help them debug problems

later. Despite the importance of log data, the practice of logging is still largely ad hoc and

somewhat arbitrary [Chen 17b].

There exist studies that examine the practice of logging in mobile app development with a focus

on examining the pervasiveness of logging in traditional and mobile applications, the evolution of

logging, logging anti-patterns and bed smells, etc. For examples, Yuan et al. [Yuan 2012a]

conducted an empirical study on the logging practices in four open source C++ software projects

and obtained ten interesting findings on the logging practices. A replication study was conducted

by Chen et al. [Chen 17a] focusing on Java systems. The authors examined the logging practices

in 21 Java projects from the Apache Software Foundation [ASF 16].

2

In this thesis, we argue that poor logging practices may lead to more serious problems than quality

issues such as exposing user private and personal information and other sensitive data. While

studying logging practices in mobile apps is an active research area, to our knowledge, this is the

first study that explores the interplay between logging and security (more precisely data privacy)

in the context of mobile applications for Android.

Thesis Outline

The rest of the thesis is structured as follows:

Chapter 2 – Background

This chapter introduces the technologies we used in our thesis. In the first section, we introduce

the fundamentals of Android framework. We explain the four main Android components. Then,

we review the concept of logging in Android apps. In the second section, we summarize previous

work related to the practice of logging in software development. The chapter continues with a

detailed literature review, followed by a general discussion.

Chapter 3 - MobiLogLeak Approach

In this chapter, we present our approach, called MobiLogLeak, for detecting logs in Android apps

that can be a source of data leakage. We start the chapter with an overview of the approach and

continue with describing each component of our methodology.

3

Chapter 4 - Evaluation

In this chapter, we show the effectiveness of MobiLogLeak when applied to more than 5,000 apps.

We discuss the main results and conclude with lessons learned.

Chapter 5 – Conclusion

In this chapter, we revisit the main contributions of this thesis. We conclude with comments about

our project and some opportunities for future research.

Related publications

Rui Zhou, Mohammad Hamdaqa, Haipeng Cai, and Abdelwahab Hamou-

Lhadj, "MobiLogLeak: A preliminary study on data leakage caused by poor logging

practices," In: Proceedings of the 27th International Conference on Software Analysis,

Evolution, and Reengineering (SANER), 2020, pp. 577-58. [Zhou 20]

4

Chapter 2 - Background

Android Architecture

Figure 2.1 shows the typical Android development process. Android development is mainly in

Java, Kotlin, and C++. After implementing the functionalities of the app, the code is compiled by

the Android SDK (Software Development Kit). The result is an archive file with apk extension,

which can then be deployed on an Android device.

Figure 2.1 Android Apps Development Process

5

Figure 2.2 Android Components

Android applications have different categories of components that may depend on each other. As

shown in Figure 2.2, an Android app can have Activities, Services, Broadcast Receivers, and

Content Providers [Android 17]. Each component has its own lifecycle, which defines the

functions to create, start, pause, restart, and end the component.

Activity: The Activity component is used to represent the app user interface. Each screen or page

in the app is associated to one activity. Android supports various layouts that enable developers to

create powerful user interface capabilities.

Service: The Service component is used to implement services that run in the background such

downloading a large file, playing background music, etc. Services do not require user interaction

and hence do not necessitate a user interface.

Broadcast Receivers: The Broadcast Receiver component is used when the app needs to receive

events from other apps. In some cases, even though the app is not running in the background, the

6

system could still send the broadcasts from other apps to it. For instance, some Android apps could

set notifications to users in order to remind them of important events.

Content Provider: The content Provider is used to access the data in the file system. This data

may be located in a database, XML file, or stored on remote storage devices.

Logging in Android Development

A logging statement is typically composed of an object, a verbosity level, a static text, and/or a

dynamic content. An example of a logging statement is shown in Listing 1, “Log” is the object.

“info” is the verbosity level, which means information, and “log statement” is the static text. A

dynamic content could refer to variables (not shown in this example).

Log.info("log statement");

Staticinvoke <android.util.Log: int info(java.lang.String,java.lang.String)> ("log statement");

Listing 1. Log Message Example

The object is provided by the class android.util.Log. It is the default logging library for Android.

Similar to the print statement (e.g. System.out.print), log messages will appear in the backend

terminal when the program executes the statement. However, the default logging library does not

support any formatting capabilities, making it difficult for developer to process the resulting log

files. It just shows the value of its parameters. This is why many developers prefer to import other

7

advanced logging libraries such as Timber1, ZLog2, Logger3, which provide better formatting

features than the standard logging library, hence facilitating post-mortem analysis of log messages.

For example, these libraries use different coloring schemes to distinguish among various verbosity

levels.

Even though there exist many logging libraries in Android, most of them support five verbosity

level: Log.v(), Log.d(), Log.i(), Log.w() and Log.e(), which stand for VERBOSE, DEBUG, INFO,

WARN, ERROR. Among these levels, the Verbose and Debug verbosity levels should only be

used during development to help developers debug their programs before releasing them. On the

other hand, the Info, Warn and Error log messages are part of the program after deployment.

The static text and dynamic content are used by developers to output information about the

program. In most cases, this information is used for debugging, which helps check errors and their

causes. This explains why the majority of logging statements are usually found in try/catch blocks,

for printing exceptions and error messages. In addition, we found that in Android development,

developers tend to insert logging statements before the start of services to help monitor the

execution of services. In addition, logs help the developers to understand the code and check the

performance of as system [Zeng 19]. For example, in Listing 2, the log statement in Line 22 outputs

variable $r3, which refers to variable $r1 (shown in Line 20) of type is TimeUnit. Also, $r3

includes the static text, “Closing connections idle longer than”. From this text, we can see that

1https://timber.io

2https://github.com/HardySimpson/zlog

3https://github.com/orhanobut/logger

8

variable $r1 is sued to measure the time of closing connections; this information can be used to

analyze the performance of the program.

Class: org.apache.http.impl.conn.PoolingHttpClientConnectionManager

1. public void closeIdleConnections(long, java.util.concurrent.TimeUnit)

2. {

3. org.apache.http.impl.conn.PoolingHttpClientConnectionManager $r0;

4. long $l0;

5. java.util.concurrent.TimeUnit $r1;

6. boolean $z0;

7. java.lang.StringBuilder $r2;

8. java.lang.String $r3;

9. org.apache.http.impl.conn.CPool $r4;

10. $r0 := @this: org.apache.http.impl.conn.PoolingHttpClientConnectionManager;

11. $l0 := @parameter0: long;

12. $r1 := @parameter1: java.util.concurrent.TimeUnit;

13. $z0 = staticinvoke <android.util.Log: boolean isLoggable(java.lang.String,int)>("HttpClient", 3);

14. if $z0 == 0 goto label1;

15. $r2 = new java.lang.StringBuilder;

16. specialinvoke $r2.<java.lang.StringBuilder: void <init>()>();

17. $r2 = virtualinvoke $r2.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>("Closing c

onnections idle longer than ");

18. $r2 = virtualinvoke $r2.<java.lang.StringBuilder: java.lang.StringBuilder append(long)>($l0);

19. $r2 = virtualinvoke $r2.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(" ");

20. $r2 = virtualinvoke $r2.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.Object)>($r1);

21. $r3 = virtualinvoke $r2.<java.lang.StringBuilder: java.lang.String toString()>();

22. staticinvoke <android.util.Log: int d(java.lang.String,java.lang.String)>("HttpClient", $r3);

23. label1:

24. ...

9

25. return;

26. }

Listing 2: Performance Example

App developers use different local directories to store log information including “/data/local/tmp/”,

“/data/tmp/”, etc. Normally, the app developers use Logcat or ADB (Android Debug Bridge) to

retrieved and process logs. These tools can get the logs by checking the libraries “/dev/log/main”.

Literature Review of Existing Log Analysis Studies

2.3.1 Logging practices

Yuan et al. [Yuan 12a] surveyed the logging practices in four open-source projects. They deep

analyzed which parts of logging code cost developers’ most of time to fix. To achieve this

objective, they build a prototype with some recurring error patterns to evaluate the feasibility of

anticipating the introduction of problematic logging code. In addition, they provide a simple

checker which could detect unknown problematic logging statements and confirmed the feasibility

of leveraging historical data to improve logging code.

Chen et al. [Chen 17a] conducted a study focusing on Java systems. The authors examined the

logging practices in 21 Java projects from the Apache Software Foundation. They addressed five

questions related to logging pervasiveness, bug reports, log modification, characteristics of

consistent updates and after-thought updates. They set several criteria to filter the log statements

and calculated the log density, logging insertion, log deletion, log move, and log update. Their

research shows that despite the pervasiveness of logging, the logging practice remains ad hoc and

arbitrary.

10

Zeng et al. [Zeng 19] conducted a research study on the characteristics of logging practices in

1,444 open source mobile apps from the F-droid repository [F-Droid 17]. They checked the

logging density, rational behind logging, and performance impact. The authors compared the

results to two previous studies [Yuan 12a] [Chen 17a] focusing on C++ and Java applications.

They found that logs are less in mobile apps than in server and desktop traditional software projects.

Then, they identified several reasons why developers put the log statements in the code: Debug,

Anomaly detection, assisting in development, bookkeeping, performance, change for consistency,

customized logging library and from third-party library.

Shang et al. [Shang 14] proposed an approach where they used the development knowledge to

help understand the meaning and impact of each specific lines. The approach starts by examining

the email threats from Hadoop, Cassandra and Zookeeper. They spend 100 hours on manually

detection, and they found 15 email inquires and 73 inquires from web research. Then, they

summarized five types of development knowledge which are applied a lot in industry, including

meaning, cause, context, impact and solution, to help understand log lines in code commits, issue

reports and other development repositories. In conclusion, they argued it’s feasible to use the

derived development knowledge to resolve the real-life log inquires.

In another study, Pecchia et al. [Pecchia 15] surveyed the event logging practices in the area of

critical industrial development. They corporated with Selex ES, a company focus on electronic

and information solutions for critical system. They want to answer: how do developers work? Why

do developers work? How do industry practices impact event logging? They did the experiment

by accessing the source code of software, inspecting about 2.3 million log entries and getting the

feedback directly from the development team. Their research contains programming practices,

11

logging objectives and issue impacting log analysis. They get 7 findings from their analysis and

their work contribute a lot to improve the event logging reengineering tasks at Selex ES.

Fu et al. [Fu 14] run a study with Microsoft developers to investigate where software developers

put log statement in industrial systems. The study focuses on three research questions: What

categories of code snippets are logged? What factors are considered for logging? Is it possible to

automatically determine where to log? In their experiment, they accessed the source code of 2

Microsoft systems. Also, they used a questionnaire survey with 54 experienced Microsoft

developers. They proposed six patterns of where the logging statements are located, and their

prediction accuracy is up to 90% F-Score. Their results suggest that it is feasible to predict the

logging statement location using a classification model.

Another study was conducted by Li et al. [Li 17a], where the authors proposed suggestions to

developers for better classification of new log statements with a focus on log levels. Normally, log

levels include fatal, debug, info, warn and error. However, in their studies, software developers

often have problems deciding which level they should give to the new statements. Therefore, they

investigated the development history of Hadoop, Directory Server, Hama and Qpid to survey the

log level practices and build a classifier that predict the adequate log level based on historical data.

2.3.2 Quality of Logs

Cinque et al. [Cinque 10] proposed an approach to prove the effectiveness of log statement when

software fault. They used the G-SWFIT technique [Durase 06], which could inject faults to

applications, to cause the failure of tested software. Then, they check the log files which keep track

of this problems and verify whether the log files could help detect the faults and help resolve them.

12

They tested Apache server, TAO open data Distribution Service and MySQL. In their result, 60%

failures are caused without leaving log trace. They hope their work will contribute to the

improvement of logging mechanisms. Cinque et al. [Cinque 12] proposed a rule-based approach

after their first study, which aims at improving the logs effectiveness of analyze software failures.

This approach works during the development time and give several criteria for the placement of

logging statements in source code. They tested this approach and it detect about 12,500 software

fault injection in real-world systems.

Chen et al. [Chen 17a] concentrated on the quality of logging and identified logging anti-patterns.

In their work, they identified six types of misuse of logs, including null-able objects, explicit cast,

wrong verbosity level, logging code smells and malformed output. In the logging code smells, they

detected two kind of duplication. They manually examined 352 pairs of independently changed

logging code snippets from ActiveMQ, Hadoop, and Maven. They provided a tool called

LCAnalyzer, which helps detect these anti-patterns.

One of logging anti-pattern from Chen et al. [Chen 17a], duplication, is deeply studied by Li et al.

[Li 19]. They did the research in 4 open-source system (Hadoop, CloudStack, ElasticSearch and

Cassandra) and they manually studied more than 3,000 duplicate logging statements. Furthermore,

they summarized 5 patterns of duplication logging code smells. Then, they verified their results

by contacting the developers and get their feedback. After research, they provide the tool named

DLFinder, which could conduct automatically log static analysis in duplication. Next, they tested

their tool by applying it in another 2 systems, Camel and Wicket. In their results, DLFinder

reported 82 duplicated code smell instances and in the end, all of them are fixed by the developers.

13

Another tool named “LOGADVISOR” was proposed by Zhu et al. [Zhu 15] by applying Machine

Learning. The tool is used to guide software developers on to place the new log statements. To

evaluate the feasible of their tool, LOGADVISOR, they tested it on two systems from Microsoft

and two open-source projects which are maintained in GitHub. Their results show that they can

make provide good recommendations to developers on how to write log statement.

Khanmohammadi et al. [Khanmohammadi 19] conducted research on Android repackaged apps,

which are considered as one the top 10 risks in mobile security [OWASP 16]. They tested more

than 15,000 apps from AndroZoo [Li 17b] and studied the motivation of developers and users of

repacked apps. Also, they detected the factors which determine the apps to be repackaged and the

ways how these apps are repackaged. Their insights can be of a great help to security experts. In

addition, a novel app indexing scheme was proposed to minimize the number of comparisons

needed to detect repackaged apps in app stores.

2.3.3 Android Vulnerability Analysis

Software systems are known to have vulnerabilities that can be exploited by attackers [Murtaza

16]. In the context of Android apps, there is a large body of research on app vulnerabilities. In this

thesis, we present some studies related to this thesis. Cai et al. [Cai 17] created a software toolkit,

called DroidFax, that is designed to assist developers in program comprehension of Android

applications. This tool shows the app characterization in multiple dimensions and views.

Particularly, DroidFax provides all sorts of statistics about an Android app for quality assessment.

The authors applied the tool to the analysis of 125 apps selected randomly from Google Play and

concentrate on their dynamic characteristics. Also, they used it on 610 sample apps of malware.

14

Their results provide new insights about Android app behavior. It also helps provide a

comprehensive understanding of the code structure of Android apps.

Lu et al. [Lu 12] proposed an approach to automatically detect component hijacking vulnerability

in Android apps. They used a static analysis framework called CHEX (Component Hijacking

Examiner), which operates on the system dependency graph, extracted from Android app bytecode.

The graph is analyzed the detect possible hijack-enabling flow paths. The authors evaluated their

approach on 5,486 Android apps and found 254 apps with potential component hijacking

vulnerabilities. In their result, CHEX spent 37.02 seconds on each Android app in average,

concluding that the approach can scale up to larger datasets.

Huang et al. [Huang 14b] proposed a novel approach, AsDroid, to check stealthy behaviours in

Android apps that may be exploited by attackers to insert malware. Their approach concentrates

on the analysis of top-level functions of an Android app, which are the functions that are executed

the most when users interact with the app. The authors also used the text from the user interface

component. They analyzed the top-level functions and the extracted text to check whether they

match or not. In their work, AsDroid reported that 113 apps have stealthy behaviours, including

28 false positives and 11 false negatives.

Arzt et al. [Artz 14] developed a tool, called FlowDroid, which performs static taint analysis of

Android apps. The tool does not assume the presence of the app source code. Instead, it operates

on Jimple code. It could provide the location of source and sink in the taint flow path. The authors

applied this tool to Google Play apps, and succeed to detect vulnerabilities in 500 apps and around

1,000 malware apps used in the VirusShare project. Similarly, Huang et al. [Huang 14a] conducted

a study on applying taint analysis to Java-based web applications. They presented SFlow and

15

SFlowIner to make the dataflow and point-to-based taint flow analysis. In addition, their work

focuses on handling reflection, library and framework.

Huang et al. [Huang 16] conducted a study to detect sensitive data revelation in Android apps.

They proposed a novel static analysis technique, called BIDTEXT, that concentrates on the

variable-related text labels to check the potential disclosure of sensitive. They experimented with

a dataset of 10,000 Android apps downloaded from the Google Play store. The results show that

4,406 Android apps incur data disclosure through logging and HTTP requests.

Rasthofer et al. [Rasthofer 14] proposed SUSI, a novel machine-learning guided approach for

identifying sources and sinks directly from the code of any Android API. Based on Android apps,

they categorized the sources and sink using different labels. For the source, it has a unique

identifier, account, Bluetooth, etc. and for the sink, it has network, files, etc. Their approach was

evaluated on 11.000 malware samples and in their results achieve 92%.

Feng et al. [Feng 17] conducted a study on selecting the critical data flows based on the differences

between benign apps and malware apps. They presented a tool, Scoflow, to automatically detect

these critical data flows. They used these flows to help distinguish the malware abnormal sensitive

data usage. In their results, they compared Scoflow and Mudflow. They found their tool has high

rate of malware detection by 5.73%~9.07% on different dataset. They also argued that their tool

takes less memory space than Mudflow.

Rahul et al. [Rahul 13] conducted research on system permission of mobile apps by using the

analysis of nature language. The authors examined the app’s description and checked whether the

permissions requested by the app were justified. For example, a typical question is: why does this

app need this specific permission? They presented a framework based on natural language

16

processing called WHYPER. In the results, the framework achieved an average precision of 82.8%,

and an average recall of 81.5% for address book, calendar, and record audio permissions.

Summary

Logging is an important practice in mobile software development that is used by developers to

debug and analyze apps. There exist studies that focus on understanding the practice of logging in

general-stream software development with a focus on the quality of logs as well as where and how

to log. Logs, however, can be misused yielding security problems. To our knowledge, there are no

studies that examine the interplay between logging and security and data privacy, which is the

main objective of this thesis.

17

Chapter 3 - Mobilelike Approach

Overview

This thesis focuses on studying potential data leakage due to poor logging practices in released

Android mobile applications. Figure 3.1 shows an overview of our approach, called MobiLogLeak,

for detecting log statements in Android apps that may potentially leak private data. Our approach

consists of five steps. First, we convert the Android APK into an intermediary representation using

Jimple code [Bartel 12] in order to be able to analyze its content (Section 3.1). We do this because

we do not assume the presence of the course code. Then, we apply taint analysis to the resulting

Jimple code to identify the list of taint flow paths (Section 3.2). In our work, we focus only on

apps with taint flows since these are the potential sources for data leakage. Moreover, since this

study focuses on data leakage as a result of poor logging practices, we prune paths that are not log-

related. To do that, we use source-sink paths generated through taint flow analysis, and search the

sinks for possible log related statements (see Section 3.4 .

The results will be log-specific taint flow paths. After this, to obtain a better understanding of these

generated log related flows, we perform context analysis to analyze the code structure in order to

uncover the Android components that have the most log related flows (Section 3.4). Finally, we

manually inspect each of the taint flow paths to understand the type of the possible data leakage

cases (Section 3.6 ).

18

Figure 3.1 Overview of Our Approach

Converting App APK to Jimple code

Since we are interested in analyzing log statements in deployed apps, we cannot assume the

presence of the source code. We therefore need to reverse engineer the app APK to an intermediate

representation that we can analyze. To this end, we turn to the Soot framework [Bartel 12], which

is used for analyzing and transforming Java and Android apps to Jimple, a code format between

source code and Byte code that is commonly used in program analysis.

Although it is longer than the normal source code, Jimple code maintains the required program

constructs needed for program analysis, such as the function names, classes and code statements.

19

Hence, it can be used to identify the logging statements in an Android app. Figure 3.2 shows an

example of a Jimple code snippet of a logging statement that is generated from an APK using Soot.

Figure 3.2 Source code and its corresponding Jimple code

The Java code contains one static function, which returns an integer. Inside the function, there is

one while-loop to modify the variable resulting from multiplying it by 1 to x. Compared to Java

code, Jimple code also shows the function type, return value and parament. Rather than showing

the while-loop, Jimple uses “goto” to control the data flow in the process. The data is from label

20

0 to label 1 and then comes from the label 1 to label 0. Jimple code uses another way to achieve

the while-loop.

Taint Flow Analysis

The main aspect of our approach is to identify sources of data leaks that are related to log

statements. One way to identify data leaks is through taint flow analysis [Klieber 14]. The goal of

taint flow analysis is to check whether sensitive data remains within an expected application's

boundaries [Klieber 14]. Taint analysis can be applied statically or dynamically.

In this thesis, we apply static taint flow analysis on the generated Jimple code from the previous

step. We achieve this using FlowDroid [Arzt 14], which is a static analysis tool built on Soot that

allows us to retrieve the data flow between sources and sinks, hence uncovering all the paths that

are related to data leaks.

Figure 3.3 Taint flow analysis example

Arzt et al. [Arzt 14] proposed a tool, called FlowDroid, to help find the taint flows in Android

apps. The tool is based on the Soot framework and operates on Jimple code. FlowDroid takes the

21

apk file as input and builds a control flow graph, which shows the source and sink of taint flows.

Taint flow analysis can be described using the example of Figure 3.3. In Label 1, the variable w

gets the value from source(). Then, the source is transformed from x.f to a.g.f (Label 2 to Label 5).

Next, the variable b gets the variable a.g and in the end, the sink gets the b.f in Label 7, which is

the source in Label 1.

Generating the Source-Sink Log-Related Paths

The paths that are generated in Step 2 include all sources of data leaks (shown in Figure 3.4). In

this step, we refine the list of paths, by focusing only on paths that are related to log statements. In

other words, the goal is to retrieve only paths whose sinks contain a logging related statement.

In previous studies (e.g., [Chen 17a]), researchers filter the log statements using regular

expressions and keywords such as the ones shown in Figure 3.5. In other words, they look at

statements that contain the word “Log”. The problem is that this approach contains many false

positives such as expressions that contains words like “dialog” and “login”, which are not log

statements.

In Jimple, shown in Listing 1, log statements are accompanied with their library classes (e.g.,

android.util.Log), making it easy to filter the generated taint flow analysis paths by keeping only

those that contain log statements as their sink. A simple string match search looking for the log

related libraries in the sink for the taint flow paths suffices. The result of this step are all the taint

flow paths that are related to log statements. In the rest of this thesis, we refer to these paths as

ALRS (App Logging Related Sinks).

22

Figure 3.4 Example of a taint flow graph

Figure 3.5 Log Filter in Previous Research

23

Context Analysis of Log-Related Paths

In this step, we measure different aspects related to 'bad' log statements (logs that appear on the

taint flow analysis paths). These aspects include the component, the class, the block of this log

statement. In the component level, we check these logs are in Activity, Service, Broadcast Receiver

or Content Provider. In the class level, we print their java class names. In block level, we verify

whether they are in exception block or not. If yes, we go deep and find they are in the part of catch

or final.

For this, we use DroidFax [Cai 17], a software toolkit that is designed to assist developers in

program comprehension of Android applications. Particularly, DroidFax provides all sorts of

statistics about an Android app for quality assessment. Similar to FlowDroid [Arzt 14], it also

analyzes the application using Jimple code. We used FlowFax to check the log density, identify

the components that are logged the most, identify parts of the code that contain logs such as

exceptions and reflective functions. For example, in Listing 3, there is one log statement in Line

14. From this code snippet, we could generate Figure 3.6 for its details.

1. Class c.a.a.b.a

2. public void onPreviewFrame(byte[], android.hardware.Camera)

3. {

4. ...

5. java.lang.String $r17;

6. java.lang.NullPointerException $r18;

7. java.lang.ArrayIndexOutOfBoundsException $r19;

8. java.lang.Throwable $r20, $r21;

9. ...

10. label29:

11. $r16 := @caughtexception;

24

12. $r17 = virtualinvoke $r16.<java.lang.RuntimeException: java.lang.String toString()>();

13. $r20 = (java.lang.Throwable) $r16;

14. staticinvoke <android.util.Log: int e(java.lang.String,java.lang.String,java.lang.Throwable)>("ZXingScanner

View", $r17, $r20);

15. return;

16. ...

17. }

Listing 3: Log Snippet Example

As shown in Figure 3.6, we can directly find the context of log statement, including its method

name (onPreviewFrame) and class name (c.a.a.b.a). Furthermore, when we check the app

downloaded information, we can retrieve the app hash key (which is 00CFCA10 in our case) and

the package name (net.intricare.gobrowserkiosklockdown), highlighted in blue color in Figure

3.6.

25

Figure 3.6 Overview of Context Process

In addition, we can further use Soot to find more information such as the Activity component, the

code blocks, etc. (shown in green in Figure 3.6).

DroidFax provides the API to obtain the component type. The name is getComponentType and it

is shown in the Listing 4. It requires the soot class and return the type.

1. public static String getComponentType(SootClass cls) {

2. try {

3. if (fhar==null) {

4. fhar = Scene.v().getOrMakeFastHierarchy();

5. }

6. if (fhar.isSubclass(cls, iccAPICom.COMPONENT_TYPE_ACTIVITY))

7. return "Activity";

8. if (fhar.isSubclass(cls, iccAPICom.COMPONENT_TYPE_SERVICE) ||

26

9. fhar.isSubclass(cls, iccAPICom.COMPONENT_TYPE_GCMBASEINTENTSER VICECLASS) ||

fhar.isSubclass(cls, iccAPICom.COMPONENT_TYPE_GCMLISTENERSERVICECLASS))

10. return "Service";

11. if (fhar.isSubclass(cls, iccAPICom.COMPONENT_TYPE_RECEIVER))

12. return "BroadcastReceiver";

13. if (fhar.isSubclass(cls, iccAPICom.COMPONENT_TYPE_PROVIDER))

14. return "ContentProvider";

15. if (fhar.isSubclass(cls, iccAPICom.COMPONENT_TYPE_APPLICATION))

16. return "Application";

17. return "Unknown";

18. }

19. catch (Exception e) {

20. e.printStackTrace();

21. return "Unknown";

22. }

23. }

Listing 4: Get Component Type

When we use this function, we get the class before checking the log statement. It means we look

at the class at first and get the type. Then, all the log statements will be marked by this label.

Different from it, exception block detection is done at the level of blocks. It uses another object

named Body that helps check the exception features in the method.

27

Figure 3.7 App Detail in Google Play

In addition, we can cross reference the Hash Key and package name with information from

Android app stores (such as Google Play) such as the app name (Kiosk Browser Lockdown), the

number of downloads (more than 1,000) and the app category (Business) (Shown in Figure 3.7).

We compare the number of log statements in ALRS and those where no taint flow paths were

discovered. First, we compare the logging density between applications with taint flows (ALRS)

and those without. Second, we compare the context, in which the log statements appear, in other

words, the distribution of log statements in the different Android components: Activities, Services,

Content Providers, and Broadcast Receivers (shown in Figure 3.8).

For each app, we first identify the component that has the largest number of logs. Then, we

calculate how many apps with this component that have most logs in our dataset (AWLC).

To do it, we use the equation as below:

𝐶𝑜𝑚𝑝𝑜𝑛𝑒𝑛𝑡 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 =𝐴𝑊𝐿𝐶

Total Apps in This Year

To compare the component percentage of Good apps and ALRS., we separately computing it and

list the result in Table 4-3.

28

To measure the density of logging, we use the following equation:

𝐷𝑒𝑛𝑠𝑖𝑡𝑦 =𝐿𝐿𝑂𝐶

SLOC

where LLOC refers to the total number of log statements of Jimple code and SLOC refers to the

total number of code source lines of Jimple Code.

Figure 3.8 Context Statistics Level

Meanwhile, we calculate the percentage of exception block and reflective method. Exception is

used often during Android development, and it often uses log to show the errors. To clarify how

many logs in this part, we used the following equation:

𝐸𝑥𝑐𝑒𝑝𝑡𝑖𝑜𝑛 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 =𝐿𝐿𝐸

LLOC

Inside, the LLE is Line of Log statements in Exception block. Same to it, we calculated the

reflective calls by the equation:

29

𝑅𝑒𝑓𝑙𝑒𝑐𝑡𝑖𝑣𝑒 𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 =𝐿𝐿𝑅

LLOC

LLR is Line of Log statements in Reflective method.

Types of Log-Related Data Leakage

Figure 3.9 Source Categorization

The aim of this step is to categorize logging related data leakage cases. To this end, we manually

inspect each of the taint flow paths generated in Step 3 and categorize them into four identified

data leak types, which are database related, network related, location related, and account related.

As shown in Figure 3.9, we start from the source of taint flow path. In each source, the type is

different. From DroidFax, we could categorize it bash on its features by CatSource. CatSource is

one file which list all possible data source in taint flow analysis. We list four categories in Listing

5 and give five examples for each category.

30

1. LOCATION_INFORMATION:

2. <android.location.Country: java.lang.String getCountryIso()> (LOCATION_INFORMATION)

3. <com.android.server.location.PassiveProvider: long getStatusUpdateTime()> (LOCATION_INFORMATI

ON)

4. <android.telephony.TelephonyManager: java.lang.String getNetworkTypeName()> (LOCATION_INFORM

ATION)

5. <android.location.Address: int getMaxAddressLineIndex()> (LOCATION_INFORMATION)

6. <android.location.LocationRequest: long getInterval()> (LOCATION_INFORMATION)

7. NETWORK_INFORMATION:

8. <com.android.server.pm.PackageManagerService$ServiceIntentResolver: java.util.List queryIntentForPa

ckage(android.content.Intent,java.lang.String,int,java.util.ArrayList,int)> (NETWORK_INFORMATION)

9. <android.net.wifi.p2p.WifiP2pWfdInfo: int getControlPort()> (NETWORK_INFORMATION)

10. <com.android.server.AppWidgetServiceImpl: java.util.List getInstalledProviders()> (NETWORK_INFOR

MATION)

11. <android.support.v4.view.ViewPager$2: float getInterpolation(float)> (NETWORK_INFORMATION)

12. <com.android.internal.telephony.IccIoResult: java.lang.String toString()> (NETWORK_INFORMATION)

13.

14. ACCOUNT_INFORMATION:

15. <android.accounts.AccountManagerService$Session: android.accounts.IAccountManagerResponse get

ResponseAndClose()> (ACCOUNT_INFORMATION)

16. <android.accounts.AccountManager: android.accounts.AccountManagerFuture confirmCredentials(andro

id.accounts.Account,android.os.Bundle,android.app.Activity,android.accounts.AccountManagerCallback,and

roid.os.Handler)> (ACCOUNT_INFORMATION)

17. <android.accounts.AccountManagerService: android.accounts.Account[] getAccountsAsUser(java.lang.S

tring,int)> (ACCOUNT_INFORMATION)

18. <android.test.IsolatedContext$MockAccountManager: android.accounts.AccountManagerFuture getAcco

untsByTypeAndFeatures(java.lang.String,java.lang.String[],android.accounts.AccountManagerCallback,andr

oid.os.Handler)> (ACCOUNT_INFORMATION)

19. <android.accounts.IAccountManager$Stub$Proxy: java.lang.String getUserData(android.accounts.Accou

nt,java.lang.String)> (ACCOUNT_INFORMATION)

20.

31

21. SMS_MMS:

22. <com.google.android.mms.pdu.DeliveryInd: byte[] getMessageId()> android.permission.STOP_APP_SW

ITCHES (SMS_MMS)

23. <com.google.android.mms.pdu.AcknowledgeInd: byte[] getTransactionId()> (SMS_MMS)

24. <com.google.android.mms.pdu.NotifyRespInd: byte[] getTransactionId()> (SMS_MMS)

25. <com.google.android.mms.ContentType: java.util.ArrayList getVideoTypes()> (SMS_MMS)

26. <com.google.android.mms.pdu.PduBody: com.google.android.mms.pdu.PduPart getPartByContentId(jav

a.lang.String)> (SMS_MMS)

Listing 5: CatSource: Examples of Different Categories

For example, if we find the source is from the Android default function, getPartByContentId, we

will find the function in the CatSource file. Then, we could see the function in Line 27 of Listing

5. Therefore, we mark this source as SMS_MMS.

Summary

In this chapter, we presented our approach for detecting log statements that potentially leak

sensitive data using taint flow analysis that is generated by FlowDroid.

Our method uses static analysis of Jimple code generated from app APKs. The taint flow analysis

automatically narrows the range of apps which are potential to leak data from apps. In addition,

we study the context of the sinks and sources to understand where risky logs appear in the app.

For each log statement, we search the function that contains the statement and retrieve the

corresponding app component, class, function and code block. The manual analysis goes one step

further to uncover the leaked data.

32

Chapter 4 - Evaluation

Dataset Description

We applied MobiLogLeak to a dataset of a randomly collected sample of 5,000 Android

applications from AndroZoo [Li 17b] (2,500 apps were published in year 2017 and the other 2,500

are from 2018). We converted their APKs to Jimple code and used Flowdroid to apply taint flow

analysis. As shown in Figure 4.1, we found taint flow paths in 276 apps. We pruned paths that are

not log-related by searching the sink paths for log related statements. This resulted in 200

applications (shown in Table A1 of Appendix A) with taint flow paths that have log-related sinks.

Figure 4.1 Distribution of apps based in the dataset

33

The resulting 200 apps include apps from various app categories including music, finance,

education, fitness, etc. (as shown in Table 4-1). We found that about half of the dataset of apps

with log-related taint paths (102 apps) belong to the Personalization category. These apps collect

user profile information including all sort of people’s personal data. They manipulate more private

information than apps in other categories. For example, the app Pink Love Heart Keyboard Theme,

a Personalization app, uses logs to display network information through the Broadcast component.

We found that when this app receives data from other apps, this data is output through logging

statement. The complete list of log-related statements of the analyzed apps is shown in Table A1

of Appendix A.

Table 4-1 Categorization of apps in the dataset with log-related sinks

Category Number Category Number

Personalization 102 (51%) Music & Video 4 (2%)

Entertainment 13 (7.5%) Finance 4 (2%)

Business 8 (4%) News & Magazines 4 (2%)

Tools 8 (4%) Casino 2 (1%)

Communication 7 (3.5%) Travel & Local 2 (1%)

Lifestyle 6 (3%) Others 40 (20%)

34

Apps in other categories are also affected. For example, most entertainment apps require from

users to register with third-party social media accounts (i.e., Facebook, Twitter, Instagram), which

may lead to leakage of social media related information. The same applies to Business apps, which

manipulate information related to organizations.

Figure 4.2 Distribution of ALRS based on the number of installations

Figure 4.2 shows the distribution of ALRS based on the number of installations. We can see that

many of these applications are highly popular. In particular, three of these apps were installed

more than 100 million times, meaning that even though these apps count for only 1.5% of the total

ALRS, the number of impacted users can be considerably high.

In addition, about half of ALRS are installed less than 10 thousand times, suggesting that for these

less popular apps, developers do not pay the needed attention to logging issues. It appears that for

these apps, developers use logging for monitoring purposes, ignoring the potential threats that poor

logging practices may cause.

3

26

64

107

0

20

40

60

80

100

120

100 Million + 100 Thousand to 100 Million 10 Thousnad to 100 Thousand 10 Thousand -

35

Context Analysis Results

Table 4-2 shows that good apps have higher log density than ALRS. At first glace, this looks as a

surprising result. To further explore this, we separated the data into two datasets based on the year

of publication of the apps, i.e., 2017 and 2018, and recalculate the log density. The number of apps

in each year is 2,500. The results were consistent with our previous finding. One possible

explanation for this is that apps that are heavily logged are those written by experienced developers,

who also seem to apply good logging practices, such as removing log statements that could reveal

sensitive data before releasing the apps, but also use more logs for exception handling and for

explaining errors.

Table 4-2 Log context analysis

Type SLOC LLOC Density

Construct

Exception Reflective

Good Apps 436,167,099 1,205,557 1/362 323,295 (26.82%) 72,867 (6.04%)

ALRS 34,686,096 64,296 1/539 15,948 (24.80%) 2,272 (3.53%)

2,500 apps

in 2017

246,732,779 690,173 1/357 175,361 (25.41%) 41,994 (6.08%)

2,500 apps

in 2018

224,120,416 579,680 1/387 163,882 (28,27%) 33,145 (5.71%)

Total 470,853,195 1,269,853 1/371 339,243 (26.71%) 75,139 (5.92%)

36

Table 4-3 shows the distributions of logging statements through Android app components. The

results show that most logging occur in the Activity components, with almost no logging in the

Content provider component (zero in our case). We can see that the Activity component accounts

for 98.5% of all the logs in ALRS, suggesting that any further study for detecting logging

statements that may leak data should focus on the Activity component. The reason is that all the

interactions with users take place in Activity component such collecting the input stream, sending

text to background, and showing the results.

Table 4-3 Components for each app in different types

Component 4,800 Good Apps 200 ALRS Total 5000 Apps

Activity 4620 (96.25%) 197 (98.5%) 4817 (96.34%)

Service 15 (0.3125%) 0 15 (0.3%)

Broadcast Receiver 6 (0.125%) 1 (0.5%) 7 (0.14%)

Content Provider 0 0 0

Application 0 0 0

37

Unknown 159 (3.3125%) 2 (1%) 161 (3.22%)

As part of this analysis, we also studied if log-related sinks appear in reflective methods and

exception blocks as those two constructs normally contain log statements. We found that 25% of

the total log statements are in the exception blocks and 5% in the reflection methods. No poor

logging (i.e., sink related log statements) was reported in those two constructs.

Taint Flow Analysis Results

In this section, we show by example, how we manually inspect each of the taint flow paths

generated in Chapter 3.6 in order to categorize the log-related leakage cases based on the types of

leakage.

Recall that each taint flow path consists of a sink and a source. In our manual analysis, we start

from the source and then we find the corresponding API that is related to that source. Using the

description of the API, we can identify the actual data leaked in the corresponding taint flow path.

01. public static java.lang.String i(android.content.Context)

02. { ...

03. label05:

04. $r3 = virtualinvoke $r2.<android.telephony.TelephonyManager: java.lang.String getDeviceId()>();

05. label06:

06. $r4 = $r3;

07. $r5 = new java.lang.StringBuilder;

08. label07:

09. specialinvoke $r5.<java.lang.StringBuilder: void <init>()>();

10. $r5 = virtualinvoke $r5.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>($r3);

11. $r5 = virtualinvoke $r5.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>("device ID");

38

12. $r6 = virtualinvoke $r5.<java.lang.StringBuilder: java.lang.String toString()>();

13. staticinvoke <android.util.Log: int d(java.lang.String,java.lang.String)>("deviceIMEI", $r6);

14. ...

15. }

Listing 6: Code Snippet for Case Network Device ID

For example, in Listing 6 (the complete code is in Listing B1), the method getDeviceId() in Line

4 is a source that is related to the log-related sink that leaks the device IMEI in Line 13. To obtain

the API that corresponds to the source, we can look up the method "getDeviceId" in the list of

APIs provided by the Android software development kit (SDK).

Figure 4.3 Android API – Network: getDeviceId

39

Figure 4.3 shows the description of the API and the actual data retrieved by calling getDeviceID,

which will be then leaked through the log statement at the sink. In this case, it is the IMEI for

GSM. Based on the API description, we can categorize the leakage into one of the following four

categories.

Network, including Mac address, Device Id Sim Serial Number, country and package

manager.

Account, including name, token, password and type of the account owner.

Location, including latitude, longitude, and last-known location.

Database, including ID, password, subdomain, website link name, etc.

For the previous example, this data leakage is related to Network.

01. public static java.lang.String f(android.content.Context)

02. {

03. ...

04. $r4 = $r5.<android.accounts.Account: java.lang.String name>;

05. $r6 = $r5.<android.accounts.Account: java.lang.String type>;

06. ...

07. $r3 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>("Emails: ";

08. $r3 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>($r4);

09. $r6 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.String toString()>();

10. staticinvoke <android.util.Log: int e(java.lang.String,java.lang.String)>("PIKLOG", $r6);

11. ...

12. }

Listing 7: Code Snippet for Case Account Name

40

Account is another default object of Android. It represents an Account in the AccountManager.

This object is parcelable and also overrides equals(Object) and hashCode(), making it suitable for

use as the key of a Map. There is one example shown in the following Listing 7 (the complete

code is in Listing B2). As shown in Line 04, $r4 gets the account list and takes the value, name of

account. From Android API shown in Figure 4.4, we check the class “android.accounts.Account”.

Figure 4.4 Android API - Account

Then, in Line 08 of Listing 7, it appends this data into the String $r3 and in line 10, this statement

uses error function of logging to show it. According to the static text in Line 07, the name is the

“Emails”. Then, we can confirm the apps store the “email” as name and this code releases the

email here.

Location is a data class representing a geographic location. A location can consist of a latitude,

longitude, timestamp, and other information such as bearing, altitude and velocity. All locations

41

generated by the LocationManager are guaranteed to have a valid latitude, longitude, and

timestamp (both UTC time and elapsed real-time since boot), all other parameters are optional.

1. public android.location.Location a()

2. {

3. ...

4. label12:

5. $r0.<net.intricare.gobrowserkiosklockdown.services.f: android.location.Location d> = $r5;

6. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: android.location.Location d>;

7. if $r5 == null goto label18;

8. label13:

9. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: android.location.Location d>;

10. $d0 = virtualinvoke $r5.<android.location.Location: double getLatitude()>();

11. label14:

12. $r0.<net.intricare.gobrowserkiosklockdown.services.f: double e> = $d0;

13. label15:

14. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: android.location.Location d>;

15. $d0 = virtualinvoke $r5.<android.location.Location: double getLongitude()>();

16. label16:

17. $r0.<net.intricare.gobrowserkiosklockdown.services.f: double f> = $d0;

18. label17:

19. $r6 = new java.lang.StringBuilder;

20. specialinvoke $r6.<java.lang.StringBuilder: void <init>()>();

21. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>("latit

ude [");

22. $d0 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: double e>;

23. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder append(double)>($d0);

24. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(" ]");

25. $r7 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.String toString()>();

26. staticinvoke <android.util.Log: int d(java.lang.String,java.lang.String)>("Lac", $r7);

42

27. $r6 = new java.lang.StringBuilder;

28. specialinvoke $r6.<java.lang.StringBuilder: void <init>()>();

29. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>("lon

gitude [");

30. $d0 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: double f>;

31. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder append(double)>($d0);

32. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder append(java.lang.String)>(" ]");

33. $r7 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.String toString()>();

34. staticinvoke <android.util.Log: int d(java.lang.String,java.lang.String)>("Lac", $r7);

35. ...

36. }

Listing 8: Code Snippet for Case Location latitude and longitude

We have one example shown in the Listing 8 (the complete code is in Listing B3). Similar to the

other categories, in Line 15, it gets the longitude data and assigns it to $d0. Then, it appends this

data to String $r6 in Line 31, and converts it into String $r7 in Line 33. Finally, the program uses

debug function of logging to show it in Line 34.

Figure 4.5 Android API – Location getLatitude

43

Figure 4.6 Android API – Location getLongitude

Figure 4.5 and Figure 4.6 indicate the Android API for “getLatitude” and “getLongtitude”. These

two functions work for the “LocationManager”.

When we install an Android app, the system asks the user for permission to access location data.

For personal security or other reasons, we can deny the app from getting this permission. However,

even if we forbid an app to access the location, the apps could access the log file of other apps to

get our location information, which circumvents Android permission mechanism.

The process described so far can help in retrieving the corresponding API of a source (hence the

actual data) where the leakage is related to network, account, or location type. Unfortunately, this

process may not work in complex situations, such as in the case of the database leakage type. In

such scenarios, the path from the source to the sink is normally long and can take several

alternatives. Moreover, the automatically identified source (by Flowdroid) does not provide us

with enough information about the real source of the data that is leaked.

44

01. private boolean a(java.lang.String)

02. {

03. ...

04. $r3 = specialinvoke $r0.<net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingServic

e: java.lang.String a()>();

05. ...

06. staticinvoke <android.util.Log: int i(java.lang.String,java.lang.String)>("MyFirebaseMsgService", $r3);

07. staticinvoke <android.util.Log: int i(java.lang.String,java.lang.String)>("MyFirebaseMsgService", "oldpassw

ord and newpassword matches. save new password");

08. ...

09. }

10. ------------------------------------------------------------------------------------------------

11. private java.lang.String a()

12. {

13. ...

14. $r1 = virtualinvoke $r2.<net.intricare.gobrowserkiosklockdown.b.b: java.lang.String e()>();

15. return $r1;

16. }

17. ------------------------------------------------------------------------------------------------

18. public java.lang.String e()

19. {

20. ...

21. $i0 = interfaceinvoke $r3.<android.database.Cursor: int getColumnIndex(java.lang.String)>("password");

22. $r4 = interfaceinvoke $r3.<android.database.Cursor: java.lang.String getString(int)>($i0);

23. return $r4;

24. ...

25. }

Listing 9: Code Snippet for Case Database Password

45

Hence, the analysis normally includes tracking the real source. For example, in Listing 9 (the

complete code is in Listing B4), starting from the source in Line 22, Flowdroid pointed to a source

for a database leakage. In order to identify the actual leaked data, a thorough analysis is required

starting from the log statement at the sink to the actual source. Once we identify the real source,

based on the parameters, we can find which query the source used and what are the fields in that

query (shown in Figure 4.7)

Figure 4.7 ALRS Apps Source Categories

For example, in Listing 9, to track the data and confirm the real path, we start from the sink in

Line 06 in Listing 9. It is the log statement that contains the variable $r3. We go back and look for

the value of $r3 and in Line 04, we find it is from the function, private Boolean a(java.lang.String).

Then, we go to this function and check it. We know the final return value is $r1 in Line 15. It

comes from the function 'public java.lang.String e()' in Line 14. Then we trace this function shown

in Line 18. In the function, public java.lang.String e(), we could find the final return value is $r4.

46

Based on FlowDroid, the source is in Line 22. From Figure 4.8, we can relate this statement to the

function “getString”, which returns the value of that column.

Figure 4.8 Android API - Database getString

However, if we stop here, we can only know the data from the database. We need to go further to

identify the real source and the value from the database that is leaked. This information cannot be

inferred from getString(int)($i0). To solve this, we read this function and check it again. Luckily,

we find the $r4 is from the query operation, getColumnIndex(java.lang.String)>("password") in

Line 21. As shown in Figure 4.9, the function, getColumnIndex returns the index of the given

variable. In the code, the given variable is password, so this statement returns the index of

password and uses the index to get its value in Line 22 of Listing 9.

47

Figure 4.9 Android API - Database getColumnIndex

In this case, the source is transformed into two other functions, and we can check the static text of

logging in Line 07. It is “oldpassword and newpassword matches. save new password”. It means

this logging happens during the change of passwords. As we mentioned before, most taint flows

are likely caused by developers who use log for debugging during development but forget to delete

this type of log statement before deployment.

We applied this manual analysis to the 200 apps with log-related sinks (i.e., ALRS). In total, there

were 380 sources with elements of sensitive data, and 293 sinks with log statements related to the

data elements in the source. Out of the 380 sources, 186 (49%) leaked network sensitive data, 170

(45%) leaked database sensitive data, 22 (5%) leaked location sensitive data and 2 (0.5%) sources

leaked user account data.

48

These results, although preliminary, clearly demonstrate that logging, when not used carefully,

may lead to the leakage of sensitive information. There is a need to raise awareness around this

topic and start developing logging guidelines to prevent these situations. We suspect that the cases

we found are caused by developers who may have needed these logs during development, but

omitted to remove them before releasing the apps. We need to dig further to understand (1) the

scale of this problem, and (2) the causes.

Threats to validity

Internal Validity: We manually analyzed all the taint flow analysis paths that contain log

statements as sinks. Three authors checked the results. Because this is done manually, errors may

have occurred, which we recognize as a threat to internal validity. Another threat is related to the

selection of the 5,000 apps. We selected these apps randomly, but a different set may lead to

different results.

External Validity: Software engineering studies suffer from the variability of the real world, and

the generalization problem cannot be solved completely. Although we have used 5,000 apps in

this study (to reduce the risk of insufficient generalization), our evaluation remains preliminary

and should be qualified as an early research, and may not be generalizable to other apps.

Limitations

In our dataset, some of them cannot be analyzed by soot, the exceptions of building graph failed.

This is attributed to the limitations of static analysis. Flowdroid is known to be unable to analyze

some apps due to various reasons (e.g., corrupted DEX code, missing app assets, obfuscated code,

use of special features that are not accounted for by the analysis, etc.)

49

Flowdroid and droidfax are based on Soot. During the experiment, we have three kinds of apps.

The first group works well with Soot. They are built the process graph and return the amount for

the method categories. Second group just throws the exception like java.lang.AssertionError but

it also gives me the result. For this group, we ignore the exception. Obviously, the exception

appears after the result comes out. For the third group, it just gives me one exception like

NonePointException. This group will influence our accuracy, so when we calculate the percentage,

we will ignore this group.

Also, for the taint flows, some apps use query instead of getcolumn to show the data. For these

applications, we cannot know what kind of information is leaked. For example,

$r9 = interfaceinvoke $r15.<android.database.Cursor: java.lang.String getString(int)>(1)

It checks the index of the column so we cannot know the column name. It depends on the structure

of the database and each record returned by the query operation.

50

Chapter 5 - Conclusion and Future Work

Research Contributions

In this study, we investigated the impact of logging practices on data leakage. Particularly, we

explored how common are poor logging practices in mobile applications and the effect of not

removing logs related to sensitive information before releasing the application. Our preliminary

results show that log statements are common in the released mobile applications. There is one log

statement in each 362 lines of code. Not all these logs are a result of bad logging practices. For

example, in our study on 5,000 mobile applications, none of the log statements in the exception

blocks leaked sensitive data. On the other hand, poor logging practices are also common in mobile

applications. Out of 276 apps with taint flows, 200or around 72% leaked sensitive data due to poor

logging practices. We categorized the data leakages that are related to logging practices into four

types. The results demonstrate that in the 200 apps that we manually analyzed, there were

380sources of data leakage, 186 of these sources leaked network sensitive data, 170 leaked

database sensitive data, 22 leaked location sensitive data and 2 sources leaked user account data.

Opportunities for Further Research

Our preliminary results suggest a correlation between the context of logging practices and whether

these practices are good or poor, further investigation is required as part of future work. Another

interesting future direction is to automatically classify the leaked data in the categories network,

account, location, and database.

51

For the callback functions, we need to improve our method for analyzing them because they are

always used when systems want to manage the app lifecycle. Also, we should continue examining

log statements that appear in exception blocks, used by developers to uncover the root causes of

defects, to prevent situations where logged variables may adversely leak sensitive data.

Moreover, we should continue experimenting with more apps, including commercial apps, from

various categories to understand the scope of this problem and to design techniques for preventing

it. Priority should be given to popular apps since they can affect a large number of users.

In addition, we should develop automated techniques to detect potential data leaks in existing log

data. However, the large size of typical log data may render this task very difficult. Tools for

achieving this should be equipped with some sort of log abstraction techniques (such as the ones

surveyed by El-Masri et al. [El-Masri 2020]) in order to reduce the large size of log data while

keeping the main information. There exist also several abstraction techniques used in tracing such

as the ones proposed by Hamou-Lhadj et al. [Hamou-Lhadj 2002, Hamou-Lhadj 2006] that can be

readily applied to the abstraction of large streams of log data.

Finally, we should study whether the use of logging can be reduced by resorting to other dynamic

analysis techniques such as tracing of program flows and program profiling. These approaches do

not require as much user input as logging, which may reduce errors related to data leakage.

52

Appendix A. Results of the experiments

Table A1. 200 Taint Apps Dataset

Name Installations Label

CM Launcher 3D - Themes, Wallpapers 100,000,000 Personalization

Power Clean - Antivirus & Phone Cleaner App 100,000,000 Tools

Bit Clean 100,000,000 Tools

CM Browser - Ad Blocker, Fast Download, Privacy

5.22.21.0051 50,000,000 Communication

Opera News 50,000,000 News & Magazines

Emoji Keyboard - Cute Emoji, GIF, Sticker, Emoticon 50,000,000 Tools

Starbucks 10,000,000 Food & Drink

Moto File Manager 10,000,000 Tools

ESPNCricinfo - Live Cricket Scores, News & Videos 10,000,000 Sports

Cool Black Theme 10,000,000 Entertainment

Taichi Panda 10,000,000 Role Playing

Opera Mini browser beta 10,000,000 Communication

Gradeup: Exam Preparation App | Free Mocks | Class 5,000,000 Education

DdcatApp 1,450,000 Entertainment

Anime Wallpaper Master 1,000,000 Personalization

Say caller name 1,000,000 Tools

SignEasy | Sign and Fill PDF and other Documents 1,000,000 Business

53

Pink Glitter Bow Keyboard Theme 1,000,000 Personalization

Rummy 45 - Remi Etalat 1,000,000 Card

Alive In Shelter 1,000,000 Strategy

Facemoji Keyboard Lite: GIF, Emoji, DIY Theme 1,000,000 Personalization

Pixel Doodle: Color by Number, Pixel Art, Color Game 1,000,000

Video Editor &

Video Maker Dev

God christ keyboard 1,000,000 Personalization

Lucky Slots - Free Casino Game 1,000,000 Casino

Messaging+ SMS, MMS Free 1,000,000 Communication

Color shiny rose theme keyboard 1,000,000 Personalization

Pink Love Heart Keyboard Theme 1,000,000 Personalization

Messaging+ SMS, MMS Free 1,000,000 Communication

Summer GO Keyboard Theme 1,000,000 Personalization

Fairy live wallpaper 500,000 Personalization

Silk Gold Icons Theme 500,000 Personalization

Black Cool Wolf King Theme 500,000 Personalization

ForteBank 500,000 Finance

Entertainment 500,000 قصص الصغار

Golden Fidget Spinner Theme 500,000 Personalization

Equalizer Pro - Extra Sound 500,000 Music & Audio

Indian Girls Photo Maker 500,000 Photography

Arab Man Fashion Photo Suit 500,000 Photography

54

Custom Wallpaper Maker FREE 100,000 Tools

Graffiti Skull Smoke keyboard 100,000 Personalization

Cute Cartoon Owl Bowknot Theme 100,000 Personalization

Neon Butterfly Live Wallpaper 100,000 Personalisation

Gold Luxury Car Theme 100,000 Personalization

Cute Girl Theme 100,000 Lifestyle

Sketched Street View 100,000 Personalization

Theme for Galaxy S9 Plus 100,000 Personalization

Crimson Horrific White Eyes Theme 100,000 Personalization

Gun Man Photo Montage 100,000 Entertainment

The Purple Fantasy Wonderland Theme 100,000 Personalization

Calisteniapp - Calisthenics & Street Workout 100,000 Health & Fitness

Krishna Flute Ringtones 100,000 Music & Audio

Pink Minny Diamond keyboard 100,000 Personalization

BankSA Mobile Banking 100,000 Finance

Black Gold Business Theme 100,000 Personalization

Hijau Gothic Logam Coretan Tengkorak Tema 100,000 Personalization

Tavla 100,000 Board

Pink Rose Black Lace Theme 100,000 Personalization

Heathrow Express 100,000 Travel & Local

Shiny Neon Love Launcher 100,000 Personalization

Citations de La Vie 100,000 Entertainment

55

Dhoka Shayari 100,000 Social

Glass Water Drop Keyboard 100,000 Personalisation

Panda Sakura Theme 100,000 Personalization

Goal Achiever's Multitool - Goalist 100,000 Productivity

Motobike Sport Thème 100,000 Personalisation

Dard Shayari 100,000 Entertainment

Subway Bus Racer 50,000 Racing

Pink Love Butterfly Keyboard 50,000 Personalization

Crystal Rose Love 3D Theme 50,000 Personalization

com.hld.anzenbokusu 50,000 Tools

Tattoo Rose Romantic Wolf Theme 50,000 Personalization

Space Planet 3D Earth Theme 50,000 Lifestyle

Romantic Red Love Heart Theme 50,000 Personalization

cute skull icon pink bow theme 50,000 Personalization

Typewriter Keyboard 50,000 Personalization

Blue Flames Keyboard Theme 50,000 Personalization

Purple Luxury Golden Butterfly Theme 50,000 Personalization

Pink Cute Flower Rose Red Petals Keyboard Theme 50,000 Personalisation

3D Green Maple Leaf 50,000 Personalization

悟空问答 20,000 Communication

HTC Yellow Pages 10,000 Business

Lavender water drop keyboard theme 10,000 Personalization

56

Neon Koi Fish Space 3D Theme 10,000 Personalization

Thermal Camera Filter Effect Flashlight 10,000 Entertainment

Blackboard Graffiti Theme 10,000 Personalization

Water Fire Fist Battle Keyboard Theme 10,000 Personalization

Movilizer 10,000 Business

Spoint 10,000 Lifestyle

Butterfly Fairy Nature Theme 10,000 Personalization

Girly Paris keyboard theme 10,000 Personalization

Glitter Blue Dream Theme - glitter wallpaper 10,000 Personalization

Weed Ghost Gun Launcher Theme 10,000 Personalization

Tumbi 10,000 Music & Audio

Blue Glitter Cute Panda Keyboard 10,000 Personalization

Animated Cute Pink Hearts Keyboard 10,000 Beauty

Турецкий для влюблённых (DEMO) 10,000 Travel & Local

Orange Cartoon Cute Lazy Cat Theme 10,000 Personalization

Pink Black Kitty Love Theme 10,000 Lifestyle

Black Business Gold Theme 10,000 Personalization

Judai Shayari 10,000 Social

SuperSMS - Text Messages 10,000 Communication

Fire Lion Theme 10,000 Personalisation

Cute Anime Girl 3D 10,000 Personalisation

Cute Dog Love Theme 5,000 Lifestyle

57

Citations de Nicolas Michiavel 5,000 Education

Bubble Spinner Deluxe 5,000 Casual

Cute Snow Penguin Baby Theme 5,000 Personalization

Black Skull Keyboard Theme 5,000 Personalization

Cute Panda Girl Theme 5,000 Personalization

Consumer Protection Act 5,000 Books & Reference

Cute Unicorn Whale 5,000 Personalization

Bullet Gun Theme 5,000 Personalization

Cool Speedy Racing car keyboard 5,000 Personalization

My BVF 1,000 House & Home

Electronic Music DJ Mellow Theme 1,000 Entertainment

Diamond pink leopard theme 1,000 Personalisation

GGM VIEW 1,000 Business

Christmas Gift Tree Gravity Theme 1,000 Personalization

Greenback Slots – Big Win 1,000 Casino

Therapeutic Lucky Koi Fish Satisfaction Theme 1,000 Personalization

Omni 1,000 Education

Pink SMS Keyboard Theme Diamond Ribbon 1,000 Beauty

Hot Flame Evil Skull Keyboard Theme 1,000 Entertainment

Gold Sequins Flip Keyboard Theme 1,000 Personalization

3D Galaxy Earth Moon Parallax Theme 1,000 Personalization

Kiosk Browser Lockdown 1,000 Business

58

Colorful Abstract Simple Theme 1,000 Personalization

Panda Unicorn Star Galaxy Keyboard Theme 1,000 Personalization

Gold Christmas Theme Wallpaper 1,000 Personalization

Glow Neon Wolf Theme 1,000 Personalization

Dark Neon Panther Theme 1,000 Personalization

Multi Level Car Parking-Crazy Driving School 1,000 Simulation

Cute Bear Keyboard Theme 1,000 Personalization

Colorful Water Drop Flower Theme 1,000 Personalization

Green Football Pitch Theme 1,000 Personalization

Angry Flame Tiger Keyboard Theme 1,000 Personalization

Golden Effiel Tower Keyboard Theme 1,000 Personalization

TWICE WALLPAPER 2018 1,000 Personalization

Graffiti Theme Street Art 1,000 Personalisation

Gold Luxury Car Theme 1,000 Personalisation

Twinkle Gemstone Theme 1,000 Personalisation

The Konnected 500 Entertainment

Golden Green Butterfly Theme 500 Personalization

Nellore 500 News & Magazines

Cool Rain Glass Waterdrop Theme 500 Personalization

Tech Skull Red Live Keyboard 500 Personalization

Horror Bloody Bat Theme 500 Personalization

Snake Color Box Keyboard Theme 500 Personalization

59

Father Chritsmas Elk Theme 500 Personalization

Purple Neon Business Theme 500 Personalization

Modern Trade SFA- PepUpSales 500 Business

New Year Pink Kitty Theme 100 Personalization

Hoshiarpur 100 News & Magazines

Brunet Swan Love keyboard 100 Personalization

Radio Miel de Dios 100 Communication

VP Tandem 100 Business

Mandla 100 News & Magazines

Fantasy Flowers Live Wallpaper 100 Personalization

Sunlit Flowery Day Theme 100 Personalization

Tiny Bowman Hero: Archery Rescue Challenge 100 Comics

Cutie Bear Heart keyboard 100 Personalisation

BIGG B 50 Music & Audio

蒲公英 50 Photography

Fast Mountain Train Driver : Train Simulator 2018 50 game

고창동막골농장 5 Shopping

60

Appendix B. Code Snippet for Cases

Listing B1. Network

1. Class: net.intricare.gobrowserkiosklockdown.util.aa

2. public static java.lang.String i(android.content.Context)

3. {

4. android.content.Context $r0;

5. java.lang.Object $r1;

6. android.telephony.TelephonyManager $r2;

7. int $i0;

8. java.lang.String $r3, $r4, $r6;

9. java.lang.StringBuilder $r5;

10. java.lang.Throwable $r7, $r8;

11.

12. $r0 := @parameter0: android.content.Context;

13.

14. label01:

15. $r1 = virtualinvoke $r0.<android.content.Context: java.lang.Object

getSystemService(java.lang.String)>("phone");

16.

17. label02:

18. $r2 = (android.telephony.TelephonyManager) $r1;

19.

20. label03:

21. $i0 = staticinvoke <android.support.v4.app.a: int

a(android.content.Context,java.lang.String)>($r0,

"android.permission.READ_PHONE_STATE");

22.

61

23. label04:

24. if $i0 == 0 goto label05;

25.

26. return "";

27.

28. label05:

29. $r3 = virtualinvoke $r2.<android.telephony.TelephonyManager: java.lang.String

getDeviceId()>();

30.

31. label06:

32. $r4 = $r3;

33.

34. $r5 = new java.lang.StringBuilder;

35.

36. label07:

37. specialinvoke $r5.<java.lang.StringBuilder: void <init>()>();

38.

39. $r5 = virtualinvoke $r5.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>($r3);

40.

41. $r5 = virtualinvoke $r5.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>("device ID");

42.

43. $r6 = virtualinvoke $r5.<java.lang.StringBuilder: java.lang.String

toString()>();

44.

45. staticinvoke <android.util.Log: int

d(java.lang.String,java.lang.String)>("deviceIMEI", $r6);

46.

47. label08:

62

48. return $r3;

49.

50. label09:

51. $r7 := @caughtexception;

52.

53. label10:

54. virtualinvoke $r7.<java.lang.Exception: void printStackTrace()>();

55.

56. return $r4;

57.

58. label11:

59. $r8 := @caughtexception;

60.

61. $r4 = "";

62.

63. $r7 = $r8;

64.

65. goto label10;

66.

67. catch java.lang.Exception from label01 to label02 with label11;

68. catch java.lang.Exception from label03 to label04 with label11;

69. catch java.lang.Exception from label05 to label06 with label11;

70. catch java.lang.Exception from label07 to label08 with label09;

71. }

63

Listing B2. Account

1. Class: net.intricare.gobrowserkiosklockdown.util.aa

2. public static java.lang.String f(android.content.Context)

3. {

4. android.content.Context $r0;

5. android.accounts.AccountManager $r1;

6. android.accounts.Account[] $r2;

7. java.lang.StringBuilder $r3;

8. int $i0, $i1;

9. java.lang.String $r4, $r6;

10. android.accounts.Account $r5;

11. boolean $z0;

12. java.lang.Throwable $r7, $r8;

13.

14. $r0 := @parameter0: android.content.Context;

15.

16. label01:

17. $r1 = staticinvoke <android.accounts.AccountManager:

android.accounts.AccountManager get(android.content.Context)>($r0);

18.

19. $r2 = virtualinvoke $r1.<android.accounts.AccountManager:

android.accounts.Account[] getAccounts()>();

20.

21. label02:

22. $r3 = new java.lang.StringBuilder;

23.

24. label03:

25. specialinvoke $r3.<java.lang.StringBuilder: void <init>()>();

26.

64

27. $r3 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>("Size: ");

28.

29. label04:

30. $i0 = lengthof $r2;

31.

32. label05:

33. $r3 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.StringBuilder

append(int)>($i0);

34.

35. $r4 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.String

toString()>();

36.

37. staticinvoke <android.util.Log: int

e(java.lang.String,java.lang.String)>("PIKLOG", $r4);

38.

39. label06:

40. $i0 = lengthof $r2;

41.

42. $i1 = 0;

43.

44. label07:

45. if $i1 >= $i0 goto label16;

46.

47. $r5 = $r2[$i1];

48.

49. $r4 = $r5.<android.accounts.Account: java.lang.String name>;

50.

51. $r6 = $r5.<android.accounts.Account: java.lang.String type>;

52.

65

53. label08:

54. $z0 = virtualinvoke $r6.<java.lang.String: boolean

equals(java.lang.Object)>("com.google");

55.

56. label09:

57. if $z0 == 0 goto label12;

58.

59. $r3 = new java.lang.StringBuilder;

60.

61. label10:

62. specialinvoke $r3.<java.lang.StringBuilder: void <init>()>();

63.

64. $r3 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>("Emails: ");

65.

66. $r3 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>($r4);

67.

68. $r6 = virtualinvoke $r3.<java.lang.StringBuilder: java.lang.String

toString()>();

69.

70. staticinvoke <android.util.Log: int

e(java.lang.String,java.lang.String)>("PIKLOG", $r6);

71.

72. label11:

73. return $r4;

74.

75. label12:

76. $i1 = $i1 + 1;

77.

66

78. goto label07;

79.

80. label13:

81. $r7 := @caughtexception;

82.

83. $r4 = null;

84.

85. $r8 = $r7;

86.

87. label14:

88. virtualinvoke $r8.<java.lang.Exception: void printStackTrace()>();

89.

90. return $r4;

91.

92. label15:

93. $r8 := @caughtexception;

94.

95. goto label14;

96.

97. label16:

98. return null;

99.

100. catch java.lang.Exception from label01 to label02 with label13;

101. catch java.lang.Exception from label03 to label04 with label13;

102. catch java.lang.Exception from label05 to label06 with label13;

103. catch java.lang.Exception from label08 to label09 with label13;

104. catch java.lang.Exception from label10 to label11 with label15;

105. }

67

Listing B3. Location

1. Class: net.intricare.gobrowserkiosklockdown.services.f

2. public android.location.Location a()

3. {

4. net.intricare.gobrowserkiosklockdown.services.f $r0;

5. android.content.Context $r1;

6. java.lang.Object $r2;

7. android.location.LocationManager $r3;

8. boolean $z0;

9. android.os.Looper $r4;

10. android.location.Location $r5;

11. double $d0;

12. java.lang.StringBuilder $r6;

13. java.lang.String $r7;

14. java.lang.Throwable $r8, $r9, $r10;

15.

16. $r0 := @this: net.intricare.gobrowserkiosklockdown.services.f;

17.

18. $r1 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.content.Context h>;

19.

20. label01:

21. $r2 = virtualinvoke $r1.<android.content.Context: java.lang.Object

getSystemService(java.lang.String)>("location");

22.

23. label02:

24. $r3 = (android.location.LocationManager) $r2;

25.

68

26. $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.LocationManager g> = $r3;

27.

28. $r3 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.LocationManager g>;

29.

30. label03:

31. $z0 = virtualinvoke $r3.<android.location.LocationManager: boolean

isProviderEnabled(java.lang.String)>("gps");

32.

33. label04:

34. $r0.<net.intricare.gobrowserkiosklockdown.services.f: boolean a> = $z0;

35.

36. $r3 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.LocationManager g>;

37.

38. label05:

39. $z0 = virtualinvoke $r3.<android.location.LocationManager: boolean

isProviderEnabled(java.lang.String)>("network");

40.

41. label06:

42. $r0.<net.intricare.gobrowserkiosklockdown.services.f: boolean b> = $z0;

43.

44. $z0 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: boolean a>;

45.

46. if $z0 != 0 goto label08;

47.

48. $z0 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: boolean b>;

49.

50. if $z0 != 0 goto label08;

69

51.

52. label07:

53. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d>;

54.

55. return $r5;

56.

57. label08:

58. $r0.<net.intricare.gobrowserkiosklockdown.services.f: boolean c> = 1;

59.

60. $z0 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: boolean b>;

61.

62. if $z0 == 0 goto label24;

63.

64. $r3 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.LocationManager g>;

65.

66. label09:

67. $r4 = staticinvoke <android.os.Looper: android.os.Looper getMainLooper()>();

68.

69. virtualinvoke $r3.<android.location.LocationManager: void

requestLocationUpdates(java.lang.String,long,float,android.location.LocationLis

tener,android.os.Looper)>("network", 60000L, 10.0F, $r0, $r4);

70.

71. staticinvoke <android.util.Log: int

d(java.lang.String,java.lang.String)>("Network", "Network");

72.

73. label10:

74. $r3 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.LocationManager g>;

70

75.

76. if $r3 == null goto label24;

77.

78. $r3 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.LocationManager g>;

79.

80. label11:

81. $r5 = virtualinvoke $r3.<android.location.LocationManager:

android.location.Location getLastKnownLocation(java.lang.String)>("network");

82.

83. label12:

84. $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d> = $r5;

85.

86. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d>;

87.

88. if $r5 == null goto label24;

89.

90. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d>;

91.

92. label13:

93. $d0 = virtualinvoke $r5.<android.location.Location: double getLatitude()>();

94.

95. label14:

96. $r0.<net.intricare.gobrowserkiosklockdown.services.f: double e> = $d0;

97.

98. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d>;

71

99.

100. label15:

101. $d0 = virtualinvoke $r5.<android.location.Location: double getLongitude()>();

102.

103. label16:

104. $r0.<net.intricare.gobrowserkiosklockdown.services.f: double f> = $d0;

105.

106. $r6 = new java.lang.StringBuilder;

107.

108. label17:

109. specialinvoke $r6.<java.lang.StringBuilder: void <init>()>();

110.

111. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>("latitude [");

112.

113. label18:

114. $d0 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: double e>;

115.

116. label19:

117. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder

append(double)>($d0);

118.

119. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>(" ]");

120.

121. $r7 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.String

toString()>();

122.

123. staticinvoke <android.util.Log: int

d(java.lang.String,java.lang.String)>("Lac", $r7);

72

124.

125. label20:

126. $r6 = new java.lang.StringBuilder;

127.

128. label21:

129. specialinvoke $r6.<java.lang.StringBuilder: void <init>()>();

130.

131. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>("longitude [");

132.

133. label22:

134. $d0 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: double f>;

135.

136. label23:

137. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder

append(double)>($d0);

138.

139. $r6 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>(" ]");

140.

141. $r7 = virtualinvoke $r6.<java.lang.StringBuilder: java.lang.String

toString()>();

142.

143. staticinvoke <android.util.Log: int

d(java.lang.String,java.lang.String)>("Lac", $r7);

144.

145. label24:

146. $z0 = $r0.<net.intricare.gobrowserkiosklockdown.services.f: boolean a>;

147.

148. if $z0 == 0 goto label07;

73

149.

150. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d>;

151.

152. if $r5 != null goto label07;

153.

154. $r3 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.LocationManager g>;

155.

156. label25:

157. $r4 = staticinvoke <android.os.Looper: android.os.Looper getMainLooper()>();

158.

159. virtualinvoke $r3.<android.location.LocationManager: void

requestLocationUpdates(java.lang.String,long,float,android.location.LocationLis

tener,android.os.Looper)>("gps", 60000L, 10.0F, $r0, $r4);

160.

161. staticinvoke <android.util.Log: int

d(java.lang.String,java.lang.String)>("GPS Enabled", "GPS Enabled");

162.

163. label26:

164. $r3 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.LocationManager g>;

165.

166. if $r3 == null goto label07;

167.

168. $r3 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.LocationManager g>;

169.

170. label27:

74

171. $r5 = virtualinvoke $r3.<android.location.LocationManager:

android.location.Location getLastKnownLocation(java.lang.String)>("gps");

172.

173. label28:

174. $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d> = $r5;

175.

176. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d>;

177.

178. if $r5 == null goto label07;

179.

180. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d>;

181.

182. label29:

183. $d0 = virtualinvoke $r5.<android.location.Location: double getLatitude()>();

184.

185. label30:

186. $r0.<net.intricare.gobrowserkiosklockdown.services.f: double e> = $d0;

187.

188. $r5 = $r0.<net.intricare.gobrowserkiosklockdown.services.f:

android.location.Location d>;

189.

190. label31:

191. $d0 = virtualinvoke $r5.<android.location.Location: double getLongitude()>();

192.

193. label32:

194. $r0.<net.intricare.gobrowserkiosklockdown.services.f: double f> = $d0;

195.

75

196. goto label07;

197.

198. label33:

199. $r8 := @caughtexception;

200.

201. label34:

202. virtualinvoke $r8.<java.lang.SecurityException: void printStackTrace()>();

203.

204. label35:

205. goto label07;

206.

207. label36:

208. $r9 := @caughtexception;

209.

210. virtualinvoke $r9.<java.lang.Exception: void printStackTrace()>();

211.

212. goto label07;

213.

214. label37:

215. $r10 := @caughtexception;

216.

217. label38:

218. virtualinvoke $r10.<java.lang.SecurityException: void printStackTrace()>();

219.

220. label39:

221. goto label24;

222.

223. catch java.lang.Exception from label01 to label02 with label36;

224. catch java.lang.Exception from label03 to label04 with label36;

225. catch java.lang.Exception from label05 to label06 with label36;

76

226. catch java.lang.SecurityException from label09 to label10 with label37;

227. catch java.lang.SecurityException from label11 to label12 with label37;

228. catch java.lang.SecurityException from label13 to label14 with label37;

229. catch java.lang.SecurityException from label15 to label16 with label37;

230. catch java.lang.SecurityException from label17 to label18 with label37;

231. catch java.lang.SecurityException from label19 to label20 with label37;

232. catch java.lang.SecurityException from label21 to label22 with label37;

233. catch java.lang.SecurityException from label23 to label24 with label37;

234. catch java.lang.Exception from label09 to label10 with label36;

235. catch java.lang.Exception from label11 to label12 with label36;

236. catch java.lang.Exception from label13 to label14 with label36;

237. catch java.lang.Exception from label15 to label16 with label36;

238. catch java.lang.Exception from label17 to label18 with label36;

239. catch java.lang.Exception from label19 to label20 with label36;

240. catch java.lang.Exception from label21 to label22 with label36;

241. catch java.lang.Exception from label23 to label24 with label36;

242. catch java.lang.SecurityException from label25 to label26 with label33;

243. catch java.lang.SecurityException from label27 to label28 with label33;

244. catch java.lang.SecurityException from label29 to label30 with label33;

245. catch java.lang.SecurityException from label31 to label32 with label33;

246. catch java.lang.Exception from label25 to label26 with label36;

247. catch java.lang.Exception from label27 to label28 with label36;

248. catch java.lang.Exception from label29 to label30 with label36;

249. catch java.lang.Exception from label31 to label32 with label36;

250. catch java.lang.Exception from label34 to label35 with label36;

251. catch java.lang.Exception from label38 to label39 with label36;

252. }

77

Listing B4. Database

1. Class: net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingService

2. private boolean a(java.lang.String)

3. {

4.

net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingService

$r0;

5. java.lang.String $r1, $r3;

6. java.lang.StringBuilder $r2;

7. net.intricare.gobrowserkiosklockdown.b.b $r4;

8. int $i0;

9.

10. $r0 := @this:

net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingService;

11.

12. $r1 := @parameter0: java.lang.String;

13.

14. $r2 = new java.lang.StringBuilder;

15.

16. specialinvoke $r2.<java.lang.StringBuilder: void <init>()>();

17.

18. $r2 = virtualinvoke $r2.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>("oldpassword [");

19.

20. $r3 = specialinvoke

$r0.<net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingSer

vice: java.lang.String a()>();

21.

78

22. $r2 = virtualinvoke $r2.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>($r3);

23.

24. $r2 = virtualinvoke $r2.<java.lang.StringBuilder: java.lang.StringBuilder

append(java.lang.String)>("]");

25.

26. $r3 = virtualinvoke $r2.<java.lang.StringBuilder: java.lang.String

toString()>();

27.

28. staticinvoke <android.util.Log: int

i(java.lang.String,java.lang.String)>("MyFirebaseMsgService", $r3);

29.

30. staticinvoke <android.util.Log: int

i(java.lang.String,java.lang.String)>("MyFirebaseMsgService", "oldpassword and

newpassword matches. save new password");

31.

32. $r4 =

<net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingService

: net.intricare.gobrowserkiosklockdown.b.b a>;

33.

34. $i0 = virtualinvoke $r4.<net.intricare.gobrowserkiosklockdown.b.b: int

c(java.lang.String)>($r1);

35.

36. $r4 =

<net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingService

: net.intricare.gobrowserkiosklockdown.b.b a>;

37.

38. virtualinvoke $r4.<net.intricare.gobrowserkiosklockdown.b.b: void b()>();

39.

40. if $i0 <= 0 goto label1;

79

41.

42. return 1;

43.

44. label1:

45. return 0;

46. }

47. -------------------------------------------------------------------------------------

48. Class:

net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingService

49. private java.lang.String a()

50. {

51.

net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingService

$r0;

52. java.lang.String $r1;

53. net.intricare.gobrowserkiosklockdown.b.b $r2;

54.

55. $r0 := @this:

net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingService;

56.

57. $r2 =

<net.intricare.gobrowserkiosklockdown.firebasenotify.MyFirebaseMessagingService

: net.intricare.gobrowserkiosklockdown.b.b a>;

58.

59. $r1 = virtualinvoke $r2.<net.intricare.gobrowserkiosklockdown.b.b:

java.lang.String e()>();

60.

61. return $r1;

62. }

63. -------------------------------------------------------------------------------------

80

64. Class: net.intricare.gobrowserkiosklockdown.b.b

65. public java.lang.String e()

66. {

67. net.intricare.gobrowserkiosklockdown.b.b $r0;

68. android.database.sqlite.SQLiteDatabase $r1;

69. java.lang.String[] $r2;

70. android.database.Cursor $r3;

71. boolean $z0;

72. int $i0;

73. java.lang.String $r4;

74.

75. $r0 := @this: net.intricare.gobrowserkiosklockdown.b.b;

76.

77. $r1 = $r0.<net.intricare.gobrowserkiosklockdown.b.b:

android.database.sqlite.SQLiteDatabase b>;

78.

79. $r2 = newarray (java.lang.String)[1];

80.

81. $r2[0] = "password";

82.

83. $r3 = virtualinvoke $r1.<android.database.sqlite.SQLiteDatabase:

android.database.Cursor

query(java.lang.String,java.lang.String[],java.lang.String,java.lang.String[],j

ava.lang.String,java.lang.String,java.lang.String)>("login", $r2, null, null,

null, null, null);

84.

85. $z0 = interfaceinvoke $r3.<android.database.Cursor: boolean moveToFirst()>();

86.

87. if $z0 == 0 goto label1;

88.

81

89. $i0 = interfaceinvoke $r3.<android.database.Cursor: int

getColumnIndex(java.lang.String)>("password");

90.

91. $r4 = interfaceinvoke $r3.<android.database.Cursor: java.lang.String

getString(int)>($i0);

92.

93. return $r4;

94.

95. label1:

96. return null;

97. }

82

Bibliography

Android 17 “Android, Log,” Android Official Developers Online Document.

https://developer.android.com/reference/android/util/Log (accessed 2017).

Arzt 14 S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, and A. Bartel, “FlowDroid: precise

context, flow, field, object-sensitive and lifecycle-aware taint analysis for Android

apps,” in Proceedings of the 35th ACM SIGPLAN Conference on Programming

Language Design and Implementation - PLDI ’14, Edinburgh, United Kingdom,

2013, pp. 259–269.

Asf 16 “Welcome to The Apache Software Foundation!” https://www.apache.org/

(accessed Sep. 01, 2020).

Bartel 12 A. Bartel, J. Klein, M. Monperrus, and Y. L. Traon, “Dexpler: Converting Android

Dalvik Bytecode to Jimple for Static Analysis with Soot,” Proceedings of the ACM

SIGPLAN International Workshop on State of the Art in Java Program analysis -

SOAP ’12, pp. 27–38

Cai 17 H. Cai and B. G. Ryder, “DroidFax: A Toolkit for Systematic Characterization of

Android Applications,” in 2017 IEEE International Conference on Software

Maintenance and Evolution (ICSME), Sep. 2017, pp. 643–647

Chen 17a B. Chen and Z. M. (Jack) Jiang, “Characterizing logging practices in Java-based

open source software projects – a replication study in Apache Software Foundation,”

Empir Software Eng, vol. 22, no. 1, Feb. 2017, pp. 330–374

83

Chen 17b B. Chen and Z. M. (Jack) Jiang, “Characterizing and Detecting Anti-patterns in the

Logging Code,” in Proceedings of the 39th International Conference on Software

Engineering, Piscataway, NJ, USA, 2017, pp. 71–81.

Cinque 10 M. Cinque, D. Cotroneo, R. Natella, and A. Pecchia, “Assessing and improving the

effectiveness of logs for the analysis of software faults,” in 2010 IEEE/IFIP

International Conference on Dependable Systems Networks (DSN), Jun. 2010, pp.

457–466.

Cinque 12 M. Cinque, D. Cotroneo, and A. Pecchia, “Event Logs for the Analysis of Software

Failures: A Rule-Based Approach,” IEEE Transactions on Software Engineering,

vol. 39, no. 6, Jun. 2013, pp. 806–821.

Ding 15 R. Ding, H. Zhou, J. Lou, H. Zhang, and Q. Lin, “Log2: a cost-aware logging

mechanism for performance diagnosis,” in Proceedings of the 2015 USENIX

Conference on Usenix Annual Technical Conference, USA, Jul. 2015, pp. 139–150.

Durase 06 J. A. Duraes and H. S. Madeira, “Emulation of Software Faults: A Field Data Study

and a Practical Approach,” IEEE Transactions on Software Engineering, vol. 32,

no. 11, pp. 849–867, Nov. 2006.

El-Masri 20 D. El-Masri, F. Petrillo, Y.-G. Guéhéneuc, A. Hamou-Lhadj, and A. Bouziane, “A

systematic literature review on automated log abstraction techniques,” Information

and Software Technology, vol. 122, p. 106276, Jun. 2020.

F-Droid 17 “F-Droid - Free and Open Source Android App Repository.” https://f-droid.org/en/

(accessed 2017).

Feng 17 P. Feng, J. Ma, and C. Sun, “Selecting Critical Data Flows in Android Applications

for Abnormal Behavior Detection,” Mobile Information Systems, Apr. 30, 2017.

84

Fu 14 Q. Fu et al., “Where do developers log? an empirical study on logging practices in

industry,” in Companion Proceedings of the 36th International Conference on

Software Engineering, New York, NY, USA, May 2014, pp. 24–33.

Hamou-Lhadj 02 A. Hamou-Lhadj and T. C. Lethbridge, “Compression techniques to

simplify the analysis of large execution traces,” in Proc. of the10th International

Workshop on Program Comprehension, 2002, pp. 159–168.

Hamou-Lhadj 06 A. Hamou-Lhadj and T. Lethbridge, “Summarizing the content of large

traces to facilitate the understanding of the behaviour of a software system,” in Proc.

of the 14th IEEE International Conference on Program Comprehension (ICPC’06),

2006, pp. 181–190.

Islam 18 Md S. Islam, W. Khreich, and A. Hamou-Lhadj, “Anomaly detection techniques

based on kappa-pruned ensembles,” IEEE Transactions on Reliability, 67(1), 2018,

pp. 212–229.

Hassan 08 A. E. Hassan, D. J. Martin, P. Flora, P. Mansfield, and D. Dietz, “An Industrial

Case Study of Customizing Operational Profiles Using Log Compression,” in

Proceedings of the 30th international conference on Software engineering, New

York, NY, USA, May 2008, pp. 713–723.

He 18 P. He, Z. Chen, S. He, and M. R. Lyu, “Characterizing the natural language

descriptions in software logging statements,” in Proceedings of the 33rd

ACM/IEEE International Conference on Automated Software Engineering, New

York, NY, USA, Sep. 2018, pp. 178–189.

Huang 14a W. Huang, Y. Dong, and A. Milanova, “Type-Based Taint Analysis for Java Web

Applications,” in Proceedings of the 17th International Conference on Fundamental

85

Approaches to Software Engineering - Volume 8411, Berlin, Heidelberg, Apr.

2014, pp. 140–154.

Huang 14b J. Huang, X. Zhang, L. Tan, P. Wang, and B. Liang, “AsDroid: detecting stealthy

behaviors in Android applications by user interface and program behavior

contradiction,” in Proceedings of the 36th International Conference on Software

Engineering, New York, NY, USA, May 2014, pp. 1036–1046.

Huang 16 J. Huang, X. Zhang, and L. Tan, “Detecting sensitive data disclosure via bi-

directional text correlation analysis,” in Proceedings of the 2016 24th ACM

SIGSOFT International Symposium on Foundations of Software Engineering, New

York, NY, USA, Nov. 2016, pp. 169–180.

Jdt 15 J. team, “Eclipse Java development tools (JDT) | The Eclipse Foundation.”

https://www.eclipse.org/jdt/ (accessed 23 October 2015).

Kabinna 16 S. Kabinna, C.-P. Bezemer, W. Shang, and A. E. Hassan, “Logging library

migrations: a case study for the apache software foundation projects,” in

Proceedings of the 13th International Conference on Mining Software Repositories,

New York, NY, USA, May 2016, pp. 154–164.

Kernighan 99 B. W. Kernighan and R. Pike, The practice of programming. Reading, MA:

Addison-Wesley, 1999.

Khanmohammadi 19 K. Khanmohammadi, N. Ebrahimi, A. Hamou-Lhadj, and R. Khoury,

“Empirical study of android repackaged applications,” Empir Software Eng, vol.

24, no. 6, pp. 3587–3629, Dec. 2019.

86

Khatuya 18 S. Khatuya, N. Ganguly, J. Basak, M. Bharde, and B. Mitra, “ADELE: Anomaly

Detection from Event Log Empiricism,” in IEEE INFOCOM 2018 - IEEE

Conference on Computer Communications, Apr. 2018, pp. 2114–2122.

Klieber 14 W. Klieber, L. Flynn, A. Bhosale, L. Jia, and L. Bauer, “Android taint flow analysis

for app sets,” in Proceedings of the 3rd ACM SIGPLAN International Workshop

on the State of the Art in Java Program Analysis - SOAP ’14, Edinburgh, United

Kingdom, 2014, pp. 1–6.

Le 15 T.-D. B. Le, X.-B. D. Le, D. Lo, and I. Beschastnikh, “Synergizing Specification

Miners through Model Fissions and Fusions (T),” in 2015 30th IEEE/ACM

International Conference on Automated Software Engineering (ASE), Nov. 2015,

pp. 115–125.

Li 17a H. Li, W. Shang, and A. E. Hassan, “Which log level should developers choose for

a new logging statement? (journal-first abstract),” in 2018 IEEE 25th International

Conference on Software Analysis, Evolution and Reengineering (SANER), Mar.

2018, pp. 468–468.

Li 17b L. Li et al., “AndroZoo++: Collecting Millions of Android Apps and Their

Metadata for the Research Community,” arXiv:1709.05281 [cs], Sep. 2017.

Li 19 Z. Li, T.-H. Chen, J. Yang, and W. Shang, “DLFinder: Characterizing and

Detecting Duplicate Logging Code Smells,” in 2019 IEEE/ACM 41st International

Conference on Software Engineering (ICSE), Montreal, QC, Canada, May 2019,

pp. 152–163.

Lu 12 L. Lu, Z. Li, Z. Wu, W. Lee, and G. Jiang, “CHEX: statically vetting Android apps

for component hijacking vulnerabilities,” in Proceedings of the 2012 ACM

87

conference on Computer and communications security, New York, NY, USA, Oct.

2012, pp. 229–240.

Mariani 08 L. Mariani and F. Pastore, “Automated Identification of Failure Causes in System

Logs,” in 2008 19th International Symposium on Software Reliability Engineering

(ISSRE), Nov. 2008, pp. 117–126.

Miransky 16 A. Miranskyy, A. Hamou-Lhadj, E. Cialini, and A. Larsson, “Operational-Log

Analysis for Big Data Systems: Challenges and Solutions,” IEEE Software, vol. 33,

no. 2, pp. 52–59, Mar. 2016.

OWASP 16 “Mobile Top 10 2016-Top 10 - OWASP.”

https://www.owasp.org/index.php/Mobile_Top_10_2016-Top_10 (accessed 2016).

Pecchia 15 A. Pecchia, M. Cinque, G. Carrozza, and D. Cotroneo, “Industry Practices and

Event Logging: Assessment of a Critical Software Development Process,” in 2015

IEEE/ACM 37th IEEE International Conference on Software Engineering, May

2015, vol. 2, pp. 169–178.

Rahul 13 R. Pandita, X. Xiao, W. Yang, W. Enck, and T. Xie, “WHYPER: Towards

Automating Risk Assessment of Mobile Applications,” p. 16.

Rasthofer 14 S. Rasthofer, S. Arzt, and E. Bodden, “A Machine-learning Approach for

Classifying and Categorizing Android Sources and Sinks,” presented at the

Network and Distributed System Security Symposium, San Diego, CA, 2014.

Shang 14 W. Shang, M. Nagappan, A. E. Hassan, and Z. M. Jiang, “Understanding Log Lines

Using Development Knowledge,” in 2014 IEEE International Conference on

Software Maintenance and Evolution, Sep. 2014, pp. 21–30.

88

Xdadevelopers 19 “How to take logs in Android,” xda-developers. https://www.xda-

developers.com/how-to-take-logs-in-android/ (accessed 2019).

Xu 09 W. Xu, L. Huang, A. Fox, D. Patterson, and M. I. Jordan, “Detecting large-scale

system problems by mining console logs,” in Proceedings of the ACM SIGOPS

22nd symposium on Operating systems principles, New York, NY, USA, Oct. 2009,

pp. 117–132.

Yen 13 T.-F. Yen et al., “Beehive: large-scale log analysis for detecting suspicious activity

in enterprise networks,” in Proceedings of the 29th Annual Computer Security

Applications Conference on - ACSAC ’13, New Orleans, Louisiana, 2013, pp.

199–208.

Yuan 12a D. Yuan, S. Park, and Y. Zhou, “Characterizing logging practices in open-source

software,” in 2012 34th International Conference on Software Engineering (ICSE),

Zurich, Jun. 2012, pp. 102–112.

Yuan 12b D. Yuan, J. Zheng, S. Park, Y. Zhou, and S. Savage, “Improving software

diagnosability via log enhancement,” in Proceedings of the sixteenth international

conference on Architectural support for programming languages and operating

systems, New York, NY, USA, Mar. 2011, pp. 3–14.

Zeng 19 Y. Zeng, J. Chen, W. Shang, and T.-H. (Peter) Chen, “Studying the characteristics

of logging practices in mobile apps: a case study on F-Droid,” Empir Software Eng,

Feb. 2019.

Zhu 15 J. Zhu, P. He, Q. Fu, H. Zhang, M. R. Lyu, and D. Zhang, “Learning to Log:

Helping Developers Make Informed Logging Decisions,” in 2015 IEEE/ACM 37th

89

IEEE International Conference on Software Engineering, Florence, Italy, May

2015, pp. 415–425.

Zhou 20 R. Zhou, M. Hamdaqa, H. Cai, and A. Hamou-Lhadj, “MobiLogLeak: A

Preliminary Study on Data Leakage Caused by Poor Logging Practices,” in Proc.

of the IEEE 27th International Conference on Software Analysis, Evolution and

Reengineering (SANER'20), 2020, pp. 577–581.