Nivedit Majumdar Nivedit Majumdar

Diffrential Privacy with AI and the future of health tracking

One of the major reasons people are vary of jumping onto the self-tracking wagon is privacy of their data. Granted, when you’re trusting a fitness tracker to record how much you’ve run and how your heart rate varies, you’re trusting a slew of services that are working in the background. How can that balance be achieved?

Now with iOS 10, Apple first introduced the concept of differential privacy. And trust Apple to come up with a solution to the whole data gathering / privacy conundrum. While the usage of data driven AI services implies that data collection is necessary, differential privacy brings an entirely new level of identity masking into play.

And that shall be the crux of this article: How differential privacy might just be the most elegant solution to the privacy concerns plaguing lifeloggers out there.


There are quite a few renditions of what differential privacy mean, most of them try and complicate things. But in simple terms, differential privacy involves adding ‘noise’ (in the form of randomised data) to an individual’s usage patterns. This enables the algorithms to gain the data required, without knowing where the data has come from.


Think of a survey where a number of participants voiced their opinions on something. Once a final survey is compiled, it will show only the categories and percentage of people providing their inputs for the categories, not the actual people themselves.

In iOS, differential privacy has a lot of legwork behind it. According to Apple’s Senior VP of software engineering Craig Federighi, Apple doesn’t assemble user-profiles, has end-to-end encryption on iMessage and tries to localise the computation to a user’s device, instead of porting over the data to Apple’s servers. Additionally, differential privacy would be the best resource to access data for data driven software.

In his own words, according to Wired, “Differential privacy is a research topic in the areas of statistics and data analytics that uses hashing, subsampling and noise injection to enable…crowdsourced learning while keeping the data of individual users completely private. Apple has been doing some super-important work in this area to enable differential privacy to be deployed at scale.”



Differential privacy is said to have been inspired from two earlier concepts in statistics: the minimum query set size and Dalenius’ statistical disclosure definition. More details can be found in this article by InfoQ, but I’ll just give the idea here.

The minimum query set size aimed to ensure the safety of aggregate queries (such as those used in contingency tables and histograms, examples include SUM and AVERAGE), but had its own share of drawbacks (it could be bypassed through tracker attacks).

Dalenius’ statistical disclosure definition proposed a strict definition of data privacy: that the attacker should learn nothing about an individual that they didn’t know before using the sensitive dataset. While slightly more robust, it failed since certain types of background information could always lead to a new conclusion about an individual.

Differential privacy, according to a post by GitHub user frankmcsherry, requires that the probability a computation produces any given output changes by at most a multiplicative factor when you add or remove one record from the input.

Apple Privacy

The probability derives only from the randomness of the computation; all other quantities (input data, record to add or remove, given output) are taken to be the worst possible case. The largest multiplicative factor (actually, its natural logarithm) quantifies the amount of “privacy difference”.


With the introduction of differential privacy, data has become anonymous. For instance, there’s a new emoji replacement feature (that replaces words with emojis – for instance, a peach will pop up as an emoji suggestion when someone types in, ahem, ‘butt’) that takes in aggregate data from multiple iPhones.

This also extends to suggested replies, predictive text and search in Spotlight and Notes. But the real potential for lifeloggers will be realised in iOS 11.


Differential privacy in iOS 11 will dynamically learn commonly used Health Types. Based on the (anonymous) inputs from all its users, Apple wants to integrate what kinds of Health Types will commonly be selected, so that this can give a more granular view to lifelogging and therefore, the Quantified Self.


Differential privacy will look at broader trends without going into the user specifics, so there’s a certain amount of privacy and security that will be guaranteed. Anonymizing data is already a measure followed by most companies, but if a statistician can connect two similar points together, it would invariable reveal the identity of an individual. With differential privacy, this will no longer happen.


However, according to The Vergeit’s difficult to tell how well differential privacy works, from the outside. In their own words:

“Unlike the clear black-and-white of encryption, differential privacy works in shades of grey, balancing the reliability of the aggregate information against its potential to identify specific users. That’s sometimes referred to as a privacy budget, a kind of set balance for engineers to work against. But if you don’t work at Apple, it’s difficult to tell how strict that privacy budget really is. Apple insists it’s high enough to prevent any reidentification, but we’re mostly left to take their word for it.”

One thing’s for certain though, if differential privacy evolves at the right rate, it can certainly be one of the major game changers for the whole lifelogging space.
Further Reading:

Sign up for our monthly mailing list