studying cybercrime: raising awareness of objectivity & bias

Studying Cybercrime: Raising Questions About Objectivity & Bias

Presented by:

Kristine Gloria The Tetherless World Constellation

Rensselaer Polytechnic Institute, Troy, NY

!With thanks to co-author John S. Erickson and the extended RPI Tetherless World Team

The Process of Web Science

Berners-Lee, T. (2007) W3C Keynote. http://www.w3.org/2007/Talks/0509-www-keynote-tbl/#(10)

Via the workshop call: how can we study the phenomena of cybercrime & cyberwarfare that may offer a different perspective of what other disciplines already offer • Begin with the cycle - where in the cycle does it make sense to start? • Moving away from just one side of the cycle

http://www.w3.org/2007/Talks/0509-www-keynote-tbl/#(10)

How may a Web Scientist explore the topic of cybercrime and cyberwarfare by offering an integrated study of both the social and technical aspects of the phenomena?

Agenda

!6

I. Objectivity & Bias II. Motivation and example III. Open questions & future

discussion

Objectivity & Bias

Porter (1996) traces objectivity as having multiple interpretations construed to include notions of fairness, mechanical objectivity, and non-subjectivity. [1]

!Latour’s (2000) critique goes even further suggesting that “objectivity does not refer to a special quality of the mind . . . but to the presence of objects which have been rendered ‘able’ to object to what is told about them” [2].

The discourse of scientific objectivity and bias has a long debated history in varying definitions in multiple disciplines. Vocal critiques of objectivity and bias in social science has placed it in a contentious position - challenging the need for objectivity and implicit biases

So again, I turn back to this cycle. Unique about what Web Scientists may offer is this integrated/multi-disciplinary approach. However, with this integration, is the inheritance of the same critiques of the disciplines that feed/influence the study.

Examples of bias

!

“Passive” data collection methods in digital social science research; considered by some to be more “objective” - a more “natural” method.

“. .. [that] Facebook ‘big’ data is made by users unaware of or unconcerned about social science researchers doesn’t change the fact it is made through and around a structure engineers have coded.”

Jurgenson, N (2014). “Short Comment on Facebook methodology ‘more natural’”. The Society Pages. website. http://thesocietypages.org/cyborgology/2014/06/09/short-comment-on-facebook-as-methodologically-more-natural/

Current examples of critiques of even bias in technical execution - algorithms & bias; Twitter studies !Particularly apparent on the sociology level; look at work by Kate Crawford, Nathan Jurgenson, danah boyd. !Kate Crawford’s example of Tweets generated during Hurricane Sandy - biased as it did not present the whole pictures; the greatest tweets came from Manhattan, while few tweets came from areas like Breezy Point, Coney Island and Rockaway - “signal problem”: data are assumed to accurately reflect the social world, but there are significant gaps, with little or no signal coming from particular communities < http://blogs.hbr.org/2013/04/the-hidden-biases-in-big-data/> !In reaction to Facebook’s sociology pre-conference ahead of the American Sociological Association; wherein the claim is that research on such a platform is more “natural” - !Methodological Issues: 1) Inadequate attention to implicit and explicit structural biases of the platform(s); most frequently used to generate datasets -

http://thesocietypages.org/cyborgology/2014/06/09/short-comment-on-facebook-as-methodologically-more-natural/

Examples of bias

!

Twitter as the “model organism” for multiple research communities [1]

• Due to: data availability, tools availability, simple & clean data structure

• Biased and influenced by message length, rapid turnover, public nature, directed graph interaction

• self-selection bias, signal problems, etc.

[1] Tufekci, Z. (2013). Big Data: Pitfalls, Methods and Concepts for an Emergent Field. SSRN (March 2013). http://bit.ly/1jsN0u5![2] Rivers, C. M., & Lewis, B. L. (2014). Ethical research standards in a world of big data. F1000Research, 3. http://bit.ly/1i2eyLV

Twitter used as a means for population-level research versus selected individuals [2]

Noted by Zeynep Tufecki (Princeton) - !- Twitter has emerged as a “model organism” (one selected for intensive examination by the research community) - due to data availability, tools availability / popularity etc; simple and clean data structure

- However, not all model organisms are representative of their taxa - Influenced by message length, rapid turnover, public nature, and a directed graph of social network interaction (where one can follow without consent). - Hashtag usage; is a self-selection bias & multiple embedded layers of culture and meaning that are assumed !

- Under non-digital circumstances, IRB/ethical guidelines suggest that collection information from a public space where people could “reasonably expect to be observed by strangers” is considered appropriate even without informed consent. —- it could be reasoned then that Tweets are texts published for the purpose of sharing with others/public (question: Should one still have a reasonable expectation of privacy?)

- It would be unethical for a researcher to follow one specific shopper around the mall and gather data exclusively without his/her consent; however if the observation is done in aggregate - then it is acceptable; what is this boundary for online? !

http://bit.ly/1jsN0u5

http://bit.ly/1i2eyLV

Rally Research Example

Fieldwork: November 2013 in Washington D.C.

Exploratory observational study on rally/protest behavior during last year’s StopWatching.US rally. !- individual behavior and motivation; identification of authority, power and governance structures; and consideration of technology’s involvement as a propagator and facilitator of information flow. “ !- Cybercrime” in this instance was defined not as an action of a nation-state unto another nation-state, but rather a single agent’s action onto a nation-state, a definition motivated by the U.S. government’s same use of the term in identifying Edward Snowden’s act as a cybercrime.

Reflexivity

Practice of reflexivity

Explicit Bias

Explicit biases such as the construction of the initial interview questions were first examined. These questions focused on gathering information related to individual motivation, modes of information propagation, levels topic comprehension, etc

Implicit Bias

Implicit biases such as organizational affixation; self-selection bias. !For this example, the researcher was affiliated with one of the rally organizing groups and had access to non-public information.

One practice: !Explicit Bias !Implicit Bias !How to capture/incorporate/express these biases for later research?

Open Questions/Discussion

What ethical standards will the WS community adopt when exploring online platforms for insight into human behavior?

How do we identify and negotiate the intersections between more descriptive, context-dependent (qualitative data) with the stand-alone graph analysis or quantitative data?

How can we identify, express, capture and share both implicit and explicit biases in our research?

Proponent of mixed-methods Paper was meant to provoke some additional thought; questions that arise include:

Questions? !

Contact: [email protected] or /@gloriakt

!6

mailto:[email protected]

studying cybercrime: raising awareness of objectivity & bias

Technology