Consumers’ right to their own data is on its way

Good on the ACCC. Someone has our back!

The consumer data right (CDR), which will enable customers to safely share their data with trusted service providers is a fundamental competition and consumer reform, ACCC Chair Rod Sims said in a speech at the Consumer Policy Research Centre’s Consumer Data Conference in Melbourne today.

The ACCC will have the lead role in turning the concept of a consumer data right into a reality, including rule-making, consumer education and, eventually, enforcement.

“The consumer data right is essentially a data portability right,” Mr Sims said.

“We believe it will enable consumers to actually benefit greatly from the data that businesses already hold about them.”

Using banking, the first industry to be designated under the CDR as an example, Mr Sims explained how existing customer data held by banks can benefit homeowners.

“It is often difficult and costly for borrowers to compare the offers of mortgage providers,” Mr Sims said.

Under the CDR, “banks will make some data, such as customers’ transaction details, available to the customer or the customer’s chosen data recipient.”

“Consumer data rights will reduce the cost to borrowers of discovering and comparing offers,” Mr Sims said.

Mr Sims also stressed the importance of privacy and security in developing the CDR.

The agency will work very closely with the Office of the Australian Information Commissioner on privacy matters.

“Robust privacy protection and information security will be a core feature of the CDR,” Mr Sims said.

The data “can only be accessed by trusted parties who have the customer’s consent to access their data.”

The ACCC has created a dedicated Consumer Data Right Branch and work is already underway with a framework paper on the data rules expected for public consultation in August.

“We will also conduct consultation work with consumers and businesses,” Mr Sims said.

A copy of the speech is available at Consumer data and regulatory reform.

Another Day, Another Data Breach

Reports of data breaches are an increasingly common occurrence. In recent weeks, Ticketmaster, HealthEngine, PageUp and the Tasmanian Electoral Commission have all reported breaches.

It is easy to tune out to what is happening, particularly if it’s not your fault it happened in the first place.

But there are simple steps you can take to minimise the risk of the problem progressing from “identity compromise” to “identity crime”.

In 2012 former FBI Director Robert Mueller famously said:

I am convinced that there are only two types of companies: those that have been hacked and those that will be. And even they are converging into one category: companies that have been hacked and will be hacked again.

The types of personal information compromised might include names, addresses, dates of birth, credit card numbers, email addresses, usernames and passwords.

In some cases, very sensitive details relating to health and sexuality can be stolen.

What’s the worst that can happen?

In most cases, offenders are looking to gain money. But it’s important to differentiate between identity compromise and identity misuse.

Identity compromise is when your personal details are stolen, but no further action is taken. Identity misuse is more serious. That’s when your personal details are not only breached but are then used to perpetrate fraud, theft or other crimes.

Offenders might withdraw money from your accounts, open up new lines of credit or purchase new services in your name, or port your telecommunication services to another carrier. In worst case scenarios, victims of identity crime might be accused of a crime perpetrated by someone else.

The Australian government estimates that 5% of Australians (approximately 970,000 people) will lose money each year through identity crime, costing at least $2.2 billion annually. And it’s not always reported, so that’s likely a conservative estimate.

While millions of people are exposed to identity compromise, far fewer will actually experience identity misuse.

But identity crime can be a devastating and traumatic event. Victims spend an average of 18 hours repairing the damage and seeking to restore their identity.

It can be very difficult and cumbersome for a person to prove that any actions taken were not of their own doing.

How will I know I’ve been hacked?

Many victims of identity misuse do not realise until they start to receive bills for credit cards or services they don’t recognise, or are denied credit for a loan.

The organisations who hold your data often don’t realise they have been compromised for days, weeks or even months.

And when hacks do happen, organisations don’t always tell you upfront. The introduction of mandatory data breach notification laws in Australia is a positive step toward making potential victims aware of a data compromise, giving them the power to take action to protect themselves.

What can I do to keep safe?

Most data breaches will not reveal your entire identity but rather expose partial details. However, motivated offenders can use these details to obtain further information.

These offenders view your personal information as a commodity that can be bought, sold and traded in for financial reward, so it makes sense to protect it in the same way you would your money.

Here are some precautionary measures you can take to reduce the risks:

  • Always use strong and unique passwords. Many of us reuse passwords across multiple platforms, which means that when one is breached, offenders can access multiple accounts. Consider using a password manager.
  • Set up two-factor authentication where possible on all of your accounts.
  • Think about the information that you share and how it could be pieced together to form a holistic picture of you. For example, don’t use your mother’s maiden name as your personal security question if your entire family tree is available on a genealogy website.

And here’s what to do if you think you have been caught up in a data breach:

  • Change passwords on any account that’s been hacked, and on any other account using the same password.
  • Tell the relevant organisation what has happened. For example, if your credit card details have been compromised, you should contact your bank to cancel the card.
  • Report any financial losses to the Australian Cybercrime Online Reporting Network.
  • Check all your financial accounts and consider getting a copy of your credit report via Equifax, D&B or Experian. You can also put an alert on your name to prevent any future losses.
  • Be alert to any phishing emails. Offenders use creative methods to trick you into handing over personal information that helps them build a fuller profile of you.
  • If your email or social media accounts have been compromised, let your contacts know. They might also be targeted by an offender pretending to be you.
  • You can access personalised support at iDcare, the national support centre for identity crime in Australia and New Zealand.

The vast number of data breaches happening in the world makes it easy to tune them out. But it is important to acknowledge the reality of identity compromise. That’s not to say you need to swear off social media and never fill out an online form. Being aware of the risks and how to best to reduce them is an important step toward protecting yourself.

For further information about identity crime you can consult ACORN, Scamwatch, or the Office of the Australian Information Commissioner.

If you are experiencing any distress as a result of identity crime, please contact Lifeline.

Author: Cassandra Cross Senior Lecturer in Criminology, Queensland University of Technology

Singapore banks will benefit from regulatory push to strengthen artificial intelligence capabilities

From Moody’s

On Monday, the Monetary Authority of Singapore (MAS) announced that it is collaborating with the Economic Development Board (EDB), Infocomm Media Development Authority (IMDA) and Institute of Banking and Finance (IBF) to accelerate the adoption of artificial intelligence (AI) in the financial sector. The four agencies will jointly facilitate research and development of new AI technologies and adoption of AI-enabled products, services and processes. The effort will encompass three key initiatives: developing AI products, matching users and solution providers and strengthening AI capabilities.

The increased use of AI and data analytics by financial institutions, including Singapore’s three-largest banks, DBS Bank Ltd., Oversea-Chinese Banking Corp. Ltd. and United Overseas Bank Limited (UOB, will help them achieve greater operational cost efficiencies and tap new revenue opportunities, and is positive for their profitability. In addition, the banks will also benefit from a greater number of financial technology (fintech) companies with AI capabilities with which they can work to strengthen AI capabilities in their digital transformation.

In the collaborative effort, the EDB will augment MAS’ Artificial Intelligence and Data Analytics programme by providing support for AI solution providers locally and globally to conduct both upstream research and product development activities and create new AI products and services for Singapore’s financial sector. The MAS will work with EDB and IMDA to facilitate link-ups between companies in the financial and technology sectors, and pair local companies seeking AI solutions with credible AI solutions providers. The MAS will work closely with IBF and IMDA to equip financial industry professionals with the necessary skill set to transition into new jobs arising from the use of AI in financial services.

As part of their digital transformation, the three Singapore banks already have adopted the use of AI and analytics across various parts of their organizations and businesses. According to the banks’ managements, leveraging AI technology, for instance for machine learning and data analytics, has allowed them to automate repetitive and time-consuming manual tasks and processes, strengthen their risk management capabilities in handling complex surveillance activities and improve the productivity of their sales force and marketing efforts.

In November 2017, UOB reported that it adopted robotic process automation to handle repetitive data entry and computation tasks for its trade finance operations and retail unsecured loan processing function, which were able to substantially cut down processing time compared with the time taken to complete the tasks manually. Also in November 2017, OCBC unveiled its plans to work with fintech company ThetaRay to implement an algorithm-based solution to detect suspicious transactions in its anti-money laundering monitoring. According to the bank, the accuracy of identifying suspicious transactions increased by more than four times using the new technology.

OCBC also set up an AI-powered chatbot application in April 2017 that is able to address customer questions and compute debt-servicing requirements. The application managed to convert customer enquiries into new loan approvals totaling more than SGD100 million in 2017.
DBS reported that its sales productivity improved after relationship managers were provided with customer analytics on a mobile platform, raising the income per head by 57% over three years.

We expect Singapore’s banks to remain committed to their digital growth strategies to keep pace with customer expectations for more digital services and solutions, and remain competitive given the increasing number of fintech companies in the ecosystem. At the same time, we expect that banks will actively engage fintech companies in collaborative ventures to enhance their digital capabilities.

Commonwealth Bank confirms loss of details of almost 20 million accounts

More bad news relating to CBA. They have confirmed the loss of data relating to almost 20 million accounts. The event happened in 2016, and they decided not to inform customers, as the data “most likely” had been destroyed.

From The ABC.

The Commonwealth Bank has confirmed it lost the historical financial statements of almost 20 million accounts, but insists its customers’ information has not been compromised.

The statements, containing customers’ names, addresses, account numbers and transaction details from 2000 to 2016, were stored on two magnetic tapes which were lost by sub-contractor Fuji-Xerox last year.

When the bank became aware of the incident, it said, it ordered an independent “forensic” investigation to figure out what had happened and informed the Office of the Australian Information Commissioner (OAIC).

The inquiry, conducted by KPMG, determined the tapes had most likely been disposed of.

Commonwealth Bank’s Angus Sullivan described the incident as “unacceptable” but said the tapes did not contain any passwords or PINs that could compromise customers’ accounts.

CBA said:

Following recent media reports detailing an incident in May 2016, we want to reassure you there is no evidence of your information being compromised and you do not need to take any action.

Here is what you need to know:

  • There is no evidence that any customer information was compromised.
  • In May 2016 we were unable to confirm the scheduled destruction of two magnetic tapes used by a supplier to print bank statements. These tapes contained information including customer names, addresses, account numbers and transaction details.
  • They did not contain passwords or PINs which could enable fraud.
  • We deployed enhanced reporting and ongoing monitoring of customer accounts to ensure customers were protected. These protections are still in place today.
  • This was not cyber-related. CommBank’s technology platforms, systems, services, apps and websites were not compromised.
  • CommBank offers you a 100% security guarantee against fraud for all your accounts, where you are not at fault. We cover any loss should someone make an unauthorised transaction.

 

Shadow profiles – Facebook knows about you, even if you’re not on Facebook

From The Conversation.

Facebook’s founder and chief executive Mark Zuckerberg faced two days of grilling before US politicians this week, following concerns over how his company deals with people’s data.

But the data Facebook has on people who are not signed up to the social media giant also came under scrutiny.

During Zuckerberg’s congressional testimony he claimed to be ignorant of what are known as “shadow profiles”.

Zuckerberg: I’m not — I’m not familiar with that.

That’s alarming, given that we have been discussing this element of Facebook’s non-user data collection for the past five years, ever since the practice was brought to light by researchers at Packet Storm Security.

Maybe it was just the phrase “shadow profiles” with which Zuckerberg was unfamiliar. It wasn’t clear, but others were not impressed by his answer.

Facebook’s proactive data-collection processes have been under scrutiny in previous years, especially as researchers and journalists have delved into the workings of Facebook’s “Download Your Information” and “People You May Know” tools to report on shadow profiles.

Shadow profiles

To explain shadow profiles simply, let’s imagine a simple social group of three people – Ashley, Blair and Carmen – who already know one another, and have each others’ email address and phone numbers in their phones.

If Ashley joins Facebook and uploads her phone contacts to Facebook’s servers, then Facebook can proactively suggest friends whom she might know, based on the information she uploaded.

For now, let’s imagine that Ashley is the first of her friends to join Facebook. The information she uploaded is used to create shadow profiles for both Blair and Carmen — so that if Blair or Carmen joins, they will be recommended Ashley as a friend.

Next, Blair joins Facebook, uploading his phone’s contacts too. Thanks to the shadow profile, he has a ready-made connection to Ashley in Facebook’s “People You May Know” feature.

At the same time, Facebook has learned more about Carmen’s social circle — in spite of the fact that Carmen has never used Facebook, and therefore has never agreed to its policies for data collection.

Despite the scary-sounding name, I don’t think there is necessarily any malice or ill will in Facebook’s creation and use of shadow profiles.

It seems like a earnestly designed feature in service of Facebooks’s goal of connecting people. It’s a goal that clearly also aligns with Facebook’s financial incentives for growth and garnering advertising attention.

But the practice brings to light some thorny issues around consent, data collection, and personally identifiable information.

What data?

Some of the questions Zuckerberg faced this week highlighted issues relating to the data that Facebook collects from users, and the consent and permissions that users give (or are unaware they give).

Facebook is often quite deliberate in its characterisations of “your data”, rejecting the notion that it “owns” user data.

That said, there are a lot of data on Facebook, and what exactly is “yours” or just simply “data related to you” isn’t always clear. “Your data” notionally includes your posts, photos, videos, comments, content, and so on. It’s anything that could be considered as copyright-able work or intellectual property (IP).

What’s less clear is the state of your rights relating to data that is “about you”, rather than supplied by you. This is data that is created by your presence or your social proximity to Facebook.

Examples of data “about you” might include your browsing history and data gleaned from cookies, tracking pixels, and the like button widget, as well as social graph data supplied whenever Facebook users supply the platform with access to their phone or email contact lists.

Like most internet platforms, Facebook rejects any claim to ownership of the IP that users post. To avoid falling foul of copyright issues in the provision of its services, Facebook demands (as part of its user agreements and Statement of Rights and Responsibilites) a:

…non-exclusive, transferable, sub-licensable, royalty-free, worldwide license to use any IP content that you post on or in connection with Facebook (IP License). This IP License ends when you delete your IP content or your account unless your content has been shared with others, and they have not deleted it.

Data scares

If you’re on Facebook then you’ve probably seen a post that keeps making the rounds every few years, saying:

In response to the new Facebook guidelines I hereby declare that my copyright is attached to all of my personal details…

Part of the reason we keep seeing data scares like this is that Facebook’s lacklustre messaging around user rights and data policies have contributed to confusion, uncertainty and doubt among its users.

It was a point that Republican Senator John Kennedy raised with Zuckerberg this week (see video).

Senator John Kennedy’s exclamation is a strong, but fair assessment of the failings of Facebook’s policy messaging.

After the grilling

Zuckerberg and Facebook should learn from this congressional grilling that they have struggled and occasionally failed in their responsibilities to users.

It’s important that Facebook now makes efforts to communicate more strongly with users about their rights and responsibilities on the platform, as well as the responsibilities that Facebook owes them.

This should go beyond a mere awareness-style PR campaign. It should seek to truly inform and educate Facebook’s users, and people who are not on Facebook, about their data, their rights, and how they can meaningfully safeguard their personal data and privacy.

Given the magnitude of Facebook as an internet platform, and its importance to users across the world, the spectre of regulation will continue to raise its head.

Ideally, the company should look to broaden its governance horizons, by seeking to truly engage in consultation and reform with Facebook’s stakeholders – its users — as well as the civil society groups and regulatory bodies that seek to empower users in these spaces.

Author : Andrew Quodling PhD candidate researching governance of social media platforms, Queensland University of Technology

How you helped create the crisis in private data

From The Conversation.

As Facebook’s Mark Zuckerberg testifiesbefore Congress, he’s likely wondering how his company got to the point where he must submit to public questioning. It’s worth pondering how we, the Facebook-using public, got here too.

The scandal in which Cambridge Analytica harvested data from millions of Facebook users to craft and target advertising for Donald Trump’s presidential campaign has provoked broad outrage. More helpfully, it has exposed the powerful yet perilous role of data in U.S. society.

Repugnant as its methods were, Cambridge Analytica did not create this crisis on its own. As I argue in my forthcoming book, “The Known Citizen: A History of Privacy in Modern America,” big corporations (in this case, Facebook) and political interests (in this case, right-wing parties and campaigns) but also ordinary Americans (social media users, and thus likely you and me) all had a hand in it.

The allure of aggregate data

Businesses and governments have led the way. As long ago as the 1840s, credit-lending firms understood the profits to be made from customers’ financial reputations. These precursors of Equifax, Experian and TransUnion eventually became enormous clearinghouses of personal data.

For its part, the federal government, from the earliest census in 1790 to the creation of New Deal social welfare programs, has long relied on aggregate as well as individual data to distribute resources and administer benefits. For example, a person’s individual Social Security payments depend in part on changes in the overall cost of living across the country.

Police forces and national security analysts, too, gathered fingerprints and other data in the name of social control. Today, they employ some of the same methods as commercial data miners to profile criminals or terrorists, crafting ever-tighter nets of detection. State-of-the-art public safety tools include access to social media accounts, online photographs, geolocation information and cell tower data.

Probing the personal

The search for better data in the 20th century often meant delving into individuals’ most personal, intimate lives. To that end, marketers, strategists and behavioral researchers conducted increasingly sophisticated surveys, polls and focus groups. They identified effective ways to reach specific customers and voters – and often, to influence their behaviors.

In the middle of the last century, for example, motivational researchers sought psychological knowledge about consumers in the hopes of subconsciously influencing them through subliminal advertising. Those probes into consumers’ personalities and desires foreshadowed Cambridge Analytica’s pitch to commercial and political clients – using data, as its website proudly proclaims, “to change audience behavior.”

Citizens were not just unwitting victims of these schemes. People have regularly, and willingly, revealed details about themselves in the name of security, convenience, health, social connection and self-knowledge. Despite rising public concerns about privacy and data insecurity, large numbers of Americans still find benefits in releasing their data to government and commercial enterprises, whether through E-ZPasses, Fitbits or Instagram posts.

Revealing ourselves

It is perhaps particularly appropriate that the Facebook scandal bloomed from a personality test app, “This is your digital life.” For decades, human relations departments and popular magazines have urged Americans to yield private details, and harness the power of aggregate data, to better understand themselves. But in most situations, people weren’t consciously trading privacy for that knowledge.

In the linked and data-hungry internet age, however, those volunteered pieces of information take on lives of their own. Individual responses from 270,000 people on this particular test became a gateway to more data, including that belonging to another 87 million of their friends.

Today, data mining corporations, political operatives and others seek data everywhere, hoping to turn that information to their own advantage. As Cambridge Analytica’s actions revealed, those groups will use data for startling purposes – such as targeting very specific groups of voters with highly customized messages – even if it means violating the policies and professed intentions of one of the most powerful corporations on the planet.

The benefits of aggregate data help explain why it has been so difficult to enact rigorous privacy laws in the U.S. As government and corporate data-gathering efforts swelled over the last century, citizens largely accepted, without much discussion or protest, that their society would be fueled by the collection of personal information. In this sense, we have all – regular individuals, government agencies and corporations like Facebook – collaborated to create the present crisis around private data.

But as Zuckerberg’s summons to Washington suggests, people are beginning to grasp that Facebook’s enormous profits exploit the value of their information and come at the price of their privacy. By making the risks of this arrangement clear, Cambridge Analytica may have done some good after all.

Author: Sarah Igo, Associate Professor of History; Associate Professor of Political Science; Associate Professor of Sociology; Associate Professor of Law, Vanderbilt University

It’s time for third-party data brokers to emerge from the shadows

From The Conversation.

Facebook announced last week it would discontinue the partner programs that allow advertisers to use third-party data from companies such as Acxiom, Experian and Quantium to target users.

Graham Mudd, Facebook’s product marketing director, said in a statement:

We want to let advertisers know that we will be shutting down Partner Categories. This product enables third party data providers to offer their targeting directly on Facebook. While this is common industry practice, we believe this step, winding down over the next six months, will help improve people’s privacy on Facebook.

Few people seemed to notice, and that’s hardly surprising. These data brokers operate largely in the background.

The invisible industry worth billions

In 2014, one researcher described the entire industry as “largely invisible”. That’s no mean feat, given how much money is being made. Personal data has been dubbed the “new oil”, and data brokers are very efficient miners. In the 2018 fiscal year, Acxiom expects annual revenue of approximately US$945 million.

The data broker business model involves accumulating information about internet users (and non-users) and then selling it. As such, data brokers have highly detailed profiles on billions of individuals, comprising age, race, sex, weight, height, marital status, education level, politics, shopping habits, health issues, holiday plans, and more.

These profiles come not just from data you’ve shared, but from data shared by others, and from data that’s been inferred. In its 2014 report into the industry, the US Federal Trade Commission (FTC) showed how a single data broker had 3,000 “data segments” for nearly every US consumer.

Based on the interests inferred from this data, consumers are then placed in categories such as “dog owner” or “winter activity enthusiast”. However, some categories are potentially sensitive, including “expectant parent”, “diabetes interest” and “cholesterol focus”, or involve ethnicity, income and age. The FTC’s Jon Leibowitz described data brokers as the “unseen cyberazzi who collect information on all of us”.

In Australia, Facebook launched the Partner Categories program in 2015. Its aim was to “reach people based on what they do and buy offline”. This includes demographic and behavioural data, such as purchase history and home ownership status, which might come from public records, loyalty card programs or surveys. In other words, Partner Categories enables advertisers to use data brokers to reach specific audiences. This is particularly useful for companies that don’t have their own customer databases.

A growing concern

Third party access to personal data is causing increasing concern. This week, Grindr was shown to be revealing its users’ HIV status to third parties. Such news is unsettling, as if there are corporate eavesdroppers on even our most intimate online engagements.

The recent Cambridge Analytica furore stemmed from third parties. Indeed, apps created by third parties have proved particularly problematic for Facebook. From 2007 to 2014, Facebook encouraged external developers to create apps for users to add content, play games, share photos, and so on.

Facebook then gave the app developers wide-ranging access to user data, and to users’ friends’ data. The data shared might include details of schooling, favourite books and movies, or political and religious affiliations.

As one group of privacy researchers noted in 2011, this process, “which nearly invisibly shares not just a user’s, but a user’s friends’ information with third parties, clearly violates standard norms of information flow”.

With the Partner Categories program, the buying, selling and aggregation of user data may be largely hidden, but is it unethical? The fact that Facebook has moved to stop the arrangement suggests that it might be.

More transparency and more respect for users

To date, there has been insufficient transparency, insufficient fairness and insufficient respect for user consent. This applies to Facebook, but also to app developers, and to Acxiom, Experian, Quantium and other data brokers.

Users might have clicked “agree” to terms and conditions that contained a clause ostensibly authorising such sharing of data. However, it’s hard to construe this type of consent as morally justifying.

In Australia, new laws are needed. Data flows in complex and unpredictable ways online, and legislation ought to provide, under threat of significant penalties, that companies (and others) must abide by reasonable principles of fairness and transparency when they deal with personal information. Further, such legislation can help specify what sort of consent is required, and in which contexts. Currently, the Privacy Act doesn’t go far enough, and is too rarely invoked.

In its 2014 report, the US Federal Trade Commission called for laws that enabled consumers to learn about the existence and activities of data brokers. That should be a starting point for Australia too: consumers ought to have reasonable access to information held by these entities.

Time to regulate

Having resisted regulation since 2004, Mark Zuckerberg has finally conceded that Facebook should be regulated – and advocated for laws mandating transparency for online advertising.

Historically, Facebook has made a point of dedicating itself to openness, but Facebook itself has often operated with a distinct lack of openness and transparency. Data brokers have been even worse.

Facebook’s motto used to be “Move fast and break things”. Now Facebook, data brokers and other third parties need to work with lawmakers to move fast and fix things.

Author: Sacha Molitorisz, Postdoctoral Research Fellow, Centre for Media Transition, Faculty of Law, University of Technology Sydney

How Cambridge Analytica’s Facebook targeting model really worked

From The Conversation.

The researcher whose work is at the center of the Facebook-Cambridge Analytica data analysis and political advertising uproar has revealed that his method worked much like the one Netflix uses to recommend movies.

In an email to me, Cambridge University scholar Aleksandr Kogan explained how his statistical model processed Facebook data for Cambridge Analytica. The accuracy he claims suggests it works about as well as established voter-targeting methods based on demographics like race, age and gender.

If confirmed, Kogan’s account would mean the digital modeling Cambridge Analytica used was hardly the virtual crystal balla few have claimed. Yet the numbers Kogan provides also show what is – and isn’t – actually possible by combining personal datawith machine learning for political ends.

Regarding one key public concern, though, Kogan’s numbers suggest that information on users’ personalities or “psychographics” was just a modest part of how the model targeted citizens. It was not a personality model strictly speaking, but rather one that boiled down demographics, social influences, personality and everything else into a big correlated lump. This soak-up-all-the-correlation-and-call-it-personality approach seems to have created a valuable campaign tool, even if the product being sold wasn’t quite as it was billed.

The promise of personality targeting

In the wake of the revelations that Trump campaign consultants Cambridge Analytica used data from 50 million Facebook users to target digital political advertising during the 2016 U.S. presidential election, Facebook has lost billions in stock market value, governments on both sides of the Atlantic have opened investigations, and a nascent social movement is calling on users to #DeleteFacebook.

But a key question has remained unanswered: Was Cambridge Analytica really able to effectively target campaign messages to citizens based on their personality characteristics – or even their “inner demons,” as a company whistleblower alleged?

If anyone would know what Cambridge Analytica did with its massive trove of Facebook data, it would be Aleksandr Kogan and Joseph Chancellor. It was their startup Global Science Research that collected profile information from 270,000 Facebook users and tens of millions of their friends using a personality test app called “thisisyourdigitallife.”

Part of my own research focuses on understanding machine learning methods, and my forthcoming book discusses how digital firms use recommendation models to build audiences. I had a hunch about how Kogan and Chancellor’s model worked.

So I emailed Kogan to ask. Kogan is still a researcher at Cambridge University; his collaborator Chancellor now works at Facebook. In a remarkable display of academic courtesy, Kogan answered.

His response requires some unpacking, and some background.

From the Netflix Prize to “psychometrics”

Back in 2006, when it was still a DVD-by-mail company, Netflix offered a reward of $1 million to anyone who developed a better way to make predictions about users’ movie rankings than the company already had. A surprise top competitor was an independent software developer using the pseudonym Simon Funk, whose basic approach was ultimately incorporated into all the top teams’ entries. Funk adapted a technique called “singular value decomposition,” condensing users’ ratings of movies into a series of factors or components – essentially a set of inferred categories, ranked by importance. As Funk explained in a blog post,

“So, for instance, a category might represent action movies, with movies with a lot of action at the top, and slow movies at the bottom, and correspondingly users who like action movies at the top, and those who prefer slow movies at the bottom.”

Factors are artificial categories, which are not always like the kind of categories humans would come up with. The most important factor in Funk’s early Netflix model was defined by users who loved films like “Pearl Harbor” and “The Wedding Planner” while also hating movies like “Lost in Translation” or “Eternal Sunshine of the Spotless Mind.” His model showed how machine learning can find correlations among groups of people, and groups of movies, that humans themselves would never spot.

Funk’s general approach used the 50 or 100 most important factors for both users and movies to make a decent guess at how every user would rate every movie. This method, often called dimensionality reduction or matrix factorization, was not new. Political science researchers had shown that similar techniques using roll-call vote data could predict the votes of members of Congress with 90 percent accuracy. In psychology the “Big Five” model had also been used to predict behavior by clustering together personality questions that tended to be answered similarly.

Still, Funk’s model was a big advance: It allowed the technique to work well with huge data sets, even those with lots of missing data – like the Netflix dataset, where a typical user rated only few dozen films out of the thousands in the company’s library. More than a decade after the Netflix Prize contest ended, SVD-based methods, or related models for implicit data, are still the tool of choice for many websites to predict what users will read, watch, or buy.

These models can predict other things, too.

Facebook knows if you are a Republican

In 2013, Cambridge University researchers Michal Kosinski, David Stillwell and Thore Graepel published an article on the predictive power of Facebook data, using information gathered through an online personality test. Their initial analysis was nearly identical to that used on the Netflix Prize, using SVD to categorize both users and things they “liked” into the top 100 factors.

The paper showed that a factor model made with users’ Facebook “likes” alone was 95 percent accurate at distinguishing between black and white respondents, 93 percent accurate at distinguishing men from women, and 88 percent accurate at distinguishing people who identified as gay men from men who identified as straight. It could even correctly distinguish Republicans from Democrats 85 percent of the time. It was also useful, though not as accurate, for predicting users’ scores on the “Big Five” personality test.

There was public outcryin response; within weeks Facebook had made users’ likes private by default.

Kogan and Chancellor, also Cambridge University researchers at the time, were starting to use Facebook data for election targeting as part of a collaboration with Cambridge Analytica’s parent firm SCL. Kogan invited Kosinski and Stillwell to join his project, but it didn’t work out. Kosinski reportedly suspected Kogan and Chancellor might have reverse-engineered the Facebook “likes” model for Cambridge Analytica. Kogan denied this, saying his project “built all our models using our own data, collected using our own software.”

What did Kogan and Chancellor actually do?

As I followed the developments in the story, it became clear Kogan and Chancellor had indeed collected plenty of their own data through the thisisyourdigitallife app. They certainly could have built a predictive SVD model like that featured in Kosinski and Stillwell’s published research.

So I emailed Kogan to ask if that was what he had done. Somewhat to my surprise, he wrote back.

“We didn’t exactly use SVD,” he wrote, noting that SVD can struggle when some users have many more “likes” than others. Instead, Kogan explained, “The technique was something we actually developed ourselves … It’s not something that is in the public domain.” Without going into details, Kogan described their method as “a multi-step co-occurrence approach.”

However, his message went on to confirm that his approach was indeed similar to SVD or other matrix factorization methods, like in the Netflix Prize competition, and the Kosinki-Stillwell-Graepel Facebook model. Dimensionality reduction of Facebook data was the core of his model.

How accurate was it?

Kogan suggested the exact model used doesn’t matter much, though – what matters is the accuracy of its predictions. According to Kogan, the “correlation between predicted and actual scores … was around [30 percent] for all the personality dimensions.” By comparison, a person’s previous Big Five scores are about 70 to 80 percent accurate in predicting their scores when they retake the test.

Kogan’s accuracy claims cannot be independently verified, of course. And anyone in the midst of such a high-profile scandal might have incentive to understate his or her contribution. In his appearance on CNN, Kogan explained to a increasingly incredulous Anderson Cooper that, in fact, the models had actually not worked very well.

In fact, the accuracy Kogan claims seems a bit low, but plausible. Kosinski, Stillwell and Graepel reported comparable or slightly better results, as have several other academic studies using digital footprints to predict personality (though some of those studies had more data than just Facebook “likes”). It is surprising that Kogan and Chancellor would go to the trouble of designing their own proprietary model if off-the-shelf solutions would seem to be just as accurate.

Importantly, though, the model’s accuracy on personality scores allows comparisons of Kogan’s results with other research. Published models with equivalent accuracy in predicting personality are all much more accurate at guessing demographics and political variables.

For instance, the similar Kosinski-Stillwell-Graepel SVD model was 85 percent accurate in guessing party affiliation, even without using any profile information other than likes. Kogan’s model had similar or better accuracy. Adding even a small amount of information about friends or users’ demographics would likely boost this accuracy above 90 percent. Guesses about gender, race, sexual orientation and other characteristics would probably be more than 90 percent accurate too.

Critically, these guesses would be especially good for the most active Facebook users – the people the model was primarily used to target. Users with less activity to analyze are likely not on Facebook much anyway.

When psychographics is mostly demographics

Knowing how the model is built helps explain Cambridge Analytica’s apparently contradictory statements about the role – or lack thereof – that personality profiling and psychographics played in its modeling. They’re all technically consistent with what Kogan describes.

A model like Kogan’s would give estimates for every variable available on any group of users. That means it would automatically estimate the Big Five personality scores for every voter. But these personality scores are the output of the model, not the input. All the model knows is that certain Facebook likes, and certain users, tend to be grouped together.

With this model, Cambridge Analytica could say that it was identifying people with low openness to experience and high neuroticism. But the same model, with the exact same predictions for every user, could just as accurately claim to be identifying less educated older Republican men.

Kogan’s information also helps clarify the confusion about whether Cambridge Analytica actually deleted its trove of Facebook data, when models built from the data seem to still be circulating, and even being developed further.

The whole point of a dimension reduction model is to mathematically represent the data in simpler form. It’s as if Cambridge Analytica took a very high-resolution photograph, resized it to be smaller, and then deleted the original. The photo still exists – and as long as Cambridge Analytica’s models exist, the data effectively does too.

Author: Matthew Hindman, Associate Professor of Media and Public Affairs, George Washington University

Using Credit Card Payments Data For The Public Good

Interesting post from the UK’s Office for National Statistics blog, which highlights the power of data analytics using anonymised  credit card payments data.

The intelligent use of data gathered by our leading financial institutions can result in faster, more detailed economic statistics.  Tom Smith describes how a joint event staged by ONS and Barclaycard illustrates the vast statistical potential of  anonymised  payments data.

“My job at the Data Science Campus brings many fascinating days as we work with organisations across government and the UK to unlock the power of data. One recent event particularly stands out.

Our experts from across ONS joined forces with analysts from one of the world’s biggest financial organisations to explore how commercial payments data could help tackle some of the UK’s biggest economic questions.

Following a successful knowledge sharing day at the ONS Data Science Campus, Barclaycard, which sees nearly half of the nation’s debit and credit card transactions, hosted a ‘hackathon’ at the state-of-the-art fintech innovation centre Rise. This brought together 50 economists, developers, data scientists and analysts to address three challenges:

  • How could payments data improve our understanding of regional economies?
  • Where could financial inclusion policies best be targeted?
  • How could we use payments data to create superfast economic indicators?

Over two days, the ONS and Barclaycard teams worked collaboratively – in some cases right through the night – to identify how the payments data could be used to improve our understanding of the economy. The traditional hackathon finish saw the teams ‘pitching’ their work to a panel of judges from across ONS and Barclaycard.

The winning team focused on building predictors and indicators that provide fine-detail information for trending economic changes. Even at this early stage of development, their work shows how bringing together card spending data and economic data held by ONS could improve the information available for policy & strategy decision makers to make timely economic decisions.

There is much work to be done to turn this demonstration into a working model. But one of the things that stood-out for the judges was the winning team’s roadmap for how to get there, including the development and data architecture needed for a successful prototype.

“We’re really excited to play a key role in helping to support a better understanding of UK economic trends and growth. The hackathon was a great event to harness the excitement and expertise created through our partnership with the ONS, and the winning teams have shown tangible evidence that payments data can indeed be used for public good.” – Jon Hussey, MD Data & Strategic Analytics, Barclaycard International

For the Data Science Campus, collaborations are all about knowledge exchange. They are an opportunity for us to access expertise in tools, technologies and approaches to data science from outside government, evaluate them in a safe environment, and share our learning across ONS and wider government.

It was inspiring to see the level of energy, drive and collaboration, and to pool ONS and Barclaycard skills into understanding how payments data can be used for public good. (And it is worth pointing out that no money changed hands and no personal data were involved. ONS is only interested in producing aggregate statistics and analysis.)

Our work with Barclaycard illustrates perfectly how the rich data held by partners outside government can improve our understanding of the UK’s economy. This is a key part of ONS’ Better Statistics, Better Decisions strategy, enabling ONS to deliver high quality statistics, develop and implement innovative methods, and build data science capability by tapping in to best practices wherever they may be.

How marketers use algorithms to (try to) read your mind

From The Conversation.

Have you ever you looked for a product online and then been recommended the exact thing you need to complement it? Or have you been thinking about a particular purchase, only to receive an email with that product on sale?

All of this may give you a slightly spooky feeling, but what you’re really experiencing is the result of complex algorithms used to predict, and in some cases, even influence your behaviour.

Companies now have access to an unprecedented amount of data on your present and past shopping and browsing preferences. This ranges from transactional data, to website traffic and even social media posts. Predictive algorithms use this data to make inferences about what is likely to happen in the future.

For example, after a few times visiting a coffee shop, the barista might notice that you always order a latte with one sugar. They could then use this “data” to predict that tomorrow you will order the same thing, and have it ready for you before you get there.

Predictive algorithms work the same way, just on a much bigger scale.

How are big data and predictive algorithms used?

My colleagues and I recently conducted a study using online browsing data to show there are five reasons consumers use retail websites, ranging from simply “touching base” to planning a specific purchase.

Using historical data, we were able to see that customers who browse a wide variety of different product categories are less likely to make a purchase than those that are focused on specific products. Meanwhile consumers were more likely to purchase if they reached the website through a search engine, compared to a link in an email.

With information like this websites can be personalised based on the most likely motivation of each visitor. The next time a consumer clicks through from a search engine they can be led straight to checkout, while those wanting to browse can be given time and inspiration.

Somewhat similar to this are the predictive algorithms used to make recommendations on websites like Amazon and Netflix. Analysts estimate that 35% of what people buy on Amazon, and 75% of what they watch on Netflix, is driven by these algorithms.

These algorithms also work by analysing both your past behaviour (e.g. what you have bought or watched), as well as the behaviour of others (e.g. what people who bought or watched the same thing also bought or watched). The key to the success of these algorithms is the scope of data available. By analysing the past behaviour of similar consumers, these algorithms are able to make recommendations that are more likely to be accurate, rather than relying on guess work.

For the curious, part of Amazon’s famous recommendation algorithm was recently released as an open source project for others to build upon.

But of course, there are innumerable other data points for algorithms to analyse than just behaviour. US retailer Walmart famously stocked up on strawberry pop-tarts in the lead up to a major storm. This was the result of simple analysis of past weather data and how that influenced demand.

It is also possible to predict how purchase behaviour is likely to evolve in the future. Algorithms can predict whether a consumer is likely to change purchase channel (e.g. from in-store to online), or even if certain customers are likely to stop shopping.

Prior studies that have applied these algorithms have found companies can influence a consumer’s choice of purchase channel and even purchase value by changing the way they communicate with them, and can use promotional campaigns to decrease customer churn.

Should I be concerned?

While these predictive algorithms undoubtedly provide benefits, there are also serious issues about privacy. In the past there have been claims that companies have predicted consumers are pregnant before they know themselves.

These privacy concerns are critical and require careful consideration from both businesses and government.

However, it is important to remember that companies are not truly interested in any one consumer. While many of these algorithms are designed to mimic “personal” recommendations, in fact they are based on behaviour across the whole customer base. Additionally, the recommendations or promotions that are given to each individual are automated from the database, so the chances of any staff actually knowing about an individual customer is extremely low.

Consumers can also benefit from companies using these predictive algorithms. For example, if you search for a product online, chances are you will be targeted with ads for that product over the next few days. Depending on the company, these ads may include discount codes to encourage you to purchase. By waiting a few days after browsing, you may be able to get a discount for a product you were intending to buy anyway.

Alternatively, look for companies who adjust their price based on forecasted demand. By learning when the low-demand periods are, you can pick yourself up a bargain at lower prices. So while companies are turning to predictive analytics to try to read consumers’ minds, some smart shopping behaviours can make it a two-way street.

Author: Jason Pallant, Lecturer of Marketing, Swinburne University of Technology