auditing algorithms : towards transparency in the age of

Post on 25-Dec-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AuditingAlgorithms:TowardsTransparencyintheAgeofBigData

ChristoWilsonAssistantProfessor@NortheasternUniversitycbw@ccs.neu.edu

PersonalizationontheWebSantaBarbara,California Amherst,Massachusetts

PersonalizationisUbiquitousSearchResults

GoodsandServices

Music,Movies,Media

SocialMedia

DangersofPersonalization?

RacialDiscriminationChrisWilson

LookingforChrisWilson?Ad

FindPeopleNearYou!www.yellowpages.com

TrevonJones

TrevonJones,Arrested?Ad

SearchCriminalRecords,SexOffenderRegistry,andMore.

www.instantcheckmate.com

RacialbiasinGoogle’sAdSensesystemuncoveredbyLatanya Sweeneyin2013

Exampleofunintendedconsequences ofbigdataPeopleexhibitracialbiasintheirsearchandclickspatternsThead-placementalgorithmobservedandlearnedthesebehaviors

PriceDiscriminationShowingusersdifferentpricesInecon:differentialpricing

Example:Amazonin2001DVDsweresoldfor$3-4moretosomeusers

Surprisingly,notillegalintheUSAnti-DiscriminationActdoesnotprotectconsumers

Article20(2)oftheServicesDirectiveprotectsEUresidentsButcompaniesseemtobeflauntingtheregulation:(

WebsitesVaryPrices,DealsBasedonUsers’Information

PriceSteeringAlteringtheorderorcompositionofproductsE.g.highpriceditemsrankhigherforsomepeople

Example:Orbitz in2012UsersreceivedhotelsinadifferentorderwhensearchingNormalusers:cheaphotelsfirst;Macusers:expensivehotelsfirst

OnOrbitz,MacUsersSteeredtoPricierHotels

AuditingAlgorithmsGovernmentsandregulatorsareconcernedaboutbigdataandalgorithmsWhiteHousereports:BigData:SeizingOpportunities,PreservingValuesBigDataandDifferentialPricing

FTC’snewOfficeofTechnologyResearchandInvestigationTaskedwithmonitoringtheapplicationsofbigdataandalgorithms

Howdowemeasureandunderstandalgorithms?Algorithmsmaybetradesecrets,constantlychangingAccesstosourcecodeisnotenough,dataisequallyimportant

Emergingscientificarea:AuditingAlgorithms

GoalsofOurWork

1. UnderstandinghowcompaniescollectandsharedataaboutusersOnlineandofflineretailersAdvertisersandmarketersDatabrokerslikeAcxiom,Datalogix,Equifax,Experian,etc…

2. Reverse-engineeringonlinealgorithmstoassesstheirimpactSearchenginesOnlineadvertisementsE-commerceSocialnetworksetc…

MeasuringPersonalizationCaseStudy:GoogleSearchCaseStudy:E-commerce

MeasuringPersonalizationCaseStudy:E-commerce

AreAllDifferencesPersonalization?

Product1Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product2Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product4Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis

Product3Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product2Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product1Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Product3Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis

Product4Lorem ipsum dolorsitamet,consecteturadipiscing elit.Inmollis adipiscing pharetra.

Compare

Notnecessarily! Itcouldbe:• Updatestoinventory/prices• Tax/Shippingdifferences• Distributedinfrastructure• Load-balancing

Howcanwereliablyidentifyandquantifypersonalization?

Personalization?

ControllingforNoise

129.10.115.14

129.10.115.15 74.125.225.67

Product 1Lorem ipsum dolor sit amet, consectetur adipiscing elit. In mollis

Product 2Lorem ipsum dolor sit amet, consectetur adipiscing elit. In mollis

Queriesrunatthesametime

SameAmazonIPaddress

129.10.115.16

Product 2Lorem ipsum dolor sit amet, consectetur adipiscing elit. In mollis

Noise

Difference – Noise = Personalization

IPaddressesinthesame/24

DualMethodology

REALUSERACCOUNTS

Leveragerealuseraccountswithlotsofhistory

Measurepersonalizationinreallife

SYNTHETICUSERACCOUNTS

Createaccountsthateachvarybyonefeature

Measuretheimpactofspecificfeatures

Questionswewanttoanswer:1. Towhatextentiscontentpersonalized?2. Whatuserfeaturesdrivepersonalization?

RealUserExperiment

TaskonAmazonMechanicalTurk(AMT)Over1000sofparticipantsEachexecutedhundredsofsearchqueriesEveryquerypairedwithtwocontrolqueriesRunfromemptyaccounts,i.e.nohistoryBaselineresultsforcomparison

HTTPProxy

UserQuery

UserQueryControlQuery

ControlQuery

MeasuringPersonalizationCaseStudy:GoogleSearchCaseStudy:E-commerce

ResultsfromRealUsers

0

5

10

15

20

25

30

35

40

45

50

1 2 3 4 5 6 7 8 9 10

ResultsChanged(%

)

SearchResultRank

Control/Control

RealUser/Control Differencebetweenresultsispersonalization

Topranksarelesspersonalized

Lowerranksaremorepersonalized

• Onaverage,realusershavea12%higherchanceofdifferingthanthecontrols• Mostchangesareduetolocation

WhatCausesofPersonalization?

HistoricalFeatures• LoggedIn/Out• HistoryofSearches• HistoryofSearchResultClicks• BrowsingHistory

AMTresultsrevealextensivepersonalizationNextquestion:whatuserfeaturesdrivethis?

StaticFeatures• Gender• Age• Browser• OperatingSystem• Location(IPAddress)• LoggedIn/Out

Methodology:usesynthetic(fake)accounts

LoggedIn/OuttoGoogle

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7

Ave

rage

Jac

card

Inde

x

Day

No Cookies / No Cookies

Logged In / No Cookies

Logged Out / No Cookies

0

1

2

3

4

5

1 2 3 4 5 6 7A

vera

ge E

dit D

ista

nce

Day

Sameresults…Butina

differentorder

IPAddressGeolocation

0.5

0.6

0.7

0.8

0.9

1

1 2 3 4 5 6 7

Jacc

ard

Inde

x

Days

MA / MACA / MAUT / MAIL / MANC / MA

0

1

2

3

4

5

1 2 3 4 5 6 7

Ave

rage

Edi

t Dis

tanc

eDay

Onaverage,1differentresult

…Plus1pairofreorderedresults

WhatAboutSearchHistory?Searchfor‘healthcare’ Searchfor‘obama,’ then‘healthcare’

Subsequentqueriesmay“carry-over”

ImpactofSearchHistory

00.10.20.30.40.50.60.70.80.91

0 2.5 5 7.5 10 12.5 15 17.5 20

AverageJaccardIndex

TimeBetweenQueries(Minutes)

OverlapinResults,Searchingfor‘healthcare’and‘obama’+‘healthcare’

10minutecutoff

MeasuringPersonalizationCaseStudy:GoogleSearchCaseStudy:E-commerce

MeasuringPersonalizationCaseStudy:E-commerce

TargetedRetailers10Generalretailers

BestBuyCDWHomeDepot JCPenney Macy’sNewEgg OfficeDepot SearsStaplesWalmart

Focusonproductsreturnedbysearches,20searchterms/site

6travelsites(hotels&carrental)CheapTickets Expedia Hotels.comPricelineOrbitz Travelocity

DoUsersSeetheSamePricesfortheSameProducts?

Manysitesshowinconsistencies forrealusersUpto3.6%ofallproducts

Retailers Hotels RentalCars

%ofP

roducts

InconsistentPrices

0

200

400

600

800

1000Differencein$

95th

75th

mean

25th

5th

HowMuchMoneyAreWeTalkingAbout?

Inconsistenciescanbe$100s!(perday/nightforhotels/cars)

Retailers Hotels RentalCars

WhatFeaturesTriggerPersonalization?Methodology:usesynthetic(fake)accountsGivethemdifferentfeatures,lookforpersonalizationEachdayfor1month,runstandardsetofsearches

Category Feature TestedFeatures

Account Cookie NoAccount,LoggedIn,NoCookies

User-AgentOS WinXP,Win7,OSX,Linux

BrowserChrome33,AndroidChrome34,IE8,Firefox25,Safari7,iOSSafari6

HistoryClick BigSpender,LowSpender

Purchase BigSpender,LowSpender

HomeDepotSmartphoneusersseetotallydifferent

productsthandesktopusers

7%ofproductshavedifferentpricesonAndroid

…butthepricesonlygoupby$0.50onaverage

TravelSitesCheaptickets andOrbitz offerlowerpricesonhotelsforuserswholog-intothesites1hotelperpage,$12offpernightonaverage

Travelocityoffersdiscountsonhotelsforusersonmobiledevices1hotelperpage,$15offpernightonaverage

Pricelinechangestheorderofsearchresultsbasedonclickandpurchasehistory

Exampleofpricesteering• 2accountsclick/reservehighpricehotels• 2accountsclick/reservelowpricehotels• 2accountsdonothing

Cheaptickets/Orbitz

Cheaptickets/OrbitzCheaptickets andOrbitz offerlowerpricesonhotelsforuserswholog-intothesites

About1hotelperpagehasalowerprice

Pricesdropbyaround$12pernight

Avg.PriceDifference($)

Travelocity

iOSusersseedifferenthotels

About1hotelperpagehasalowerprice

Pricedropsbyaround$15/night

Travelocityoffersdiscountsonhotelsforusersonmobiledevices

PricelinePricelinechangestheorderofsearchresultsbasedonclickandpurchasehistory

• 2accountsclick/reservehighpricehotels• 2accountsclick/reservelowpricehotels• 2accountsdonothing

Hotels.com/ExpediaHotelsandExpediaareconductinglarge-scaleA/BtestsontheirusersWhenyouvisitthesite,youarerandomly placedina“bucket”2outof3bucketsseehigh-pricehotelsatthetopofsearchresultsTheremainingbucketseeslow-pricehotelsatthetopofthepage

ExemplifiespricesteeringTheonlywaytoseethehiddenhotelresultsistoclearyourcookiesandreloadthesite

ConclusionsandFutureWork

TheEraofBigDataAlgorithmsdrivenbybigdatashapeyourworldSearchresultsyouaregivenPricesandproductsyouareshownMovie,music,andbookrecommendationsThedirectionsyouusetodrive

Inmanycases,thesesystemsarewonderful

Inothercases,theymaybedetrimentalUnintendedconsequencesIntentionalmanipulation

EligibilityforsocialservicesAccesstocreditandbankingAllocationofpoliceforces

OurGoal:TransparencyPersonalizationisproblematicwhenitisnottransparentHowisdatabeingcollectedandshared?Howisdatabeingusedtoaltercontent?

Usealgorithmauditstoinvestigatedeployedsystems,assesstheirimpact

OurgoalistoincreasetransparencyBuilding toolstohelpusersandregulatorsReverse-engineeringsystemstounderstandhowtheyworkRaisingpublicawarenessoftheseissues

PeekingBeneaththeHoodofUber

BordersonGoogleMaps

DiscriminationintheGig-economy

Allofourcode,data,andpapersareavailableat:

http://personalization.ccs.neu.edu

top related