S1:E3 Privy's Henri Stern–Data Sovereignty In The Modern Age
Hello and welcome to episode three of Cross-Chain Examination. I'm your host, Katherine Wu.
So lately I've been thinking a lot about my own digital footprint, ie what makes up “me” in this digital world. And so obviously that went down a whole train of thinking about my own user data: Who owns it, whether or not I have access to it, etc. One of the biggest data scandals in recent days is probably the Facebook/ Cambridge Analytica scandal that people talk about a lot. In the light of that, I think privacy is probably more important than ever in this increasingly online world. To discuss this with me today, our guest is Henri Stern.
Henri is someone that I've known for a really long time in the crypto world, and I think he's probably one of the most thoughtful and smartest researchers in the crypto space. He's currently the founder of Privy and we have him on today to talk about all things privacy. We talk about what data sovereignty really means, why it's important, and also we talk about the dangers of convenience online.
So without further ado, here is the discussion.
Our big topic today is privacy: how to build for a future where we all, as users, can have more power over our own user data as we're browsing the web, accessing apps and transacting online. And to discuss this with me today is Henri Stern, co-founder of privacy and an OG Protocol research scientist in the crypto world. Welcome to the show, Henri.
Thank you. Thanks for having me.
We're very honored that you're here. I wanted to start off this discussion today by talking about privacy, specifically thinking about the world post Facebook/Cambridge Analytica data scandal. As a refresher, a few years ago, whistleblowers leaked a report that revealed that personal data belonging to millions of Facebook users was collected without their explicit consent by British consulting firm, Cambridge Analytica, predominantly to be used for political advertising.
I was listening to a couple of podcasts and blogs that you've written, and I know that you've said this yourself, which is, what happened here with Facebook wasn't really a data security issue. Rather, it's kind of the result of bad system design and users agreeing to things that they don't really understand or weren't really made super clear to them. In light of this or other instances like this, how has it shaped your thinking about the tradeoffs in building data infrastructure, specifically the trade offs between convenience and privacy?
Yeah, I think this is really a key issue and I think at the end of the day, you sort of got two, escape routes. I'm seeing a bowling lane with gutters. You've got two gutters, you've got a gutter on one side, which is the system is made so complex with so much of a need for users to opt in and agree to everything explicitly that they just don't use it.
And on the other side, you've got a system that automates everything, assumes intent on behalf of users so often that they're agreeing to things they don't know about.
And so I think the reality is that in that Facebook Cambridge Analytica world, I believe that in all likelihood, users had explicitly agreed to this through Facebook's terms of service, through the fact that the way the app works, if you share a party with it, if you share data with a third party and they can share it with friends of friends, which is how Cambridge Analytica got all their data. And so I think the reality is that this comes down to really poor UX. And so the way this has influenced my own thinking in systems building for this is really that systems have to be built with you in mind.
I think building sort of core commissioning systems or building systems that on a technical level can be made secure is not enough. I think developers and software architects and even data scientists and researchers have to be more opinionated in terms of how they're building up their systems and how their systems should play out down to the UX. And so I think this is where Apple has done a really good job, for instance, because Apple owns the entire stock, from the hardware to the OS to some of the apps. But that means that when I say log in to Uber, Apple gets to ask me, “Would you like to have access to your location?” And they can control that because they own the entire system down to the sort of telemetry that determines my location in the first place. So the challenge for us in web3 and on the web in general is how do we do this without owning the entire application stack down to the physical world, down to the device?
But yeah, I would say at a high level, what this has made me understand is really that the two sort of key themes in data privacy comes down to sort of user control. And the name of the game here is informed consent, which is the UX issue. And the second is systems architecture. And that's more of what we're used to talking about in web3 with regards to decentralization, with regards to how we use open systems to build a more easily auditable and verifiable sort of data privacy infrastructure.
Zooming out, how does web3, or how do blockchains, seek to solve or mitigate this problem? And do you think it's like an apt or useful solution at all?
And so I guess my point is the best privacy solution for me is that which gets used. And so if we build really good primitives but we don't basically convert our try all the way by going to the user, then I think it's all for nothing. On the flip side, if we build really good UX without any of the underlying primitives that allow us to verify that data is being used as it says, as a developer says, then it's an extremely brittle system. In my sense web3 is about ownership, and decentralization allows us to do this.
What does it actually get us? What are the key features that sort of decentralization buys us? The first one I would say is transparency because it is in the open. We get verifiability, which means as users we potentially get to see what is happening to our data. The second is control, which is to say the chain means that we all have our wallets with which we're adding assets into the chain in a sense is sort of the custodial backstop to data ownership. At the end of the day, the data itself can be encrypted under keys I control, which allows me to take this data with me wherever I go. Since the absence of lock in, it's the fact that you can actually create whole new user experiences thanks to this world in which the user is in control in the end.
And the last one is because of open standards in the space, we get interoperability. And I think this is kind of my key point. Going back to the Cambridge point, I think the sort of core sin is that Facebook was building an opaque system. And this is the really cool thing about web3 and the reason why I do think what we will be the nexus for change is that in web3 we're building for interoperability, we're building parts of the ISO layering of the stack such that people can build above and beneath it. And so I think my core sort of key point is - these are really complicated questions and they won't be solved by any single party and frankly, they won't be solved. There is no silver bullet when it comes to privacy. What we can do is keep on building better and better systems, learn from users to present better UX so they can make informed choices and then learn from sort of the evolving threat landscape on the web to better how systems architecture functions. But what that entails is building in the open, knowing that we're going to make mistakes and being able to own up to these mistakes. And this is kind of my big fear when I think about what's his role in changing data privacy online is, frankly, a lot of the zealotry that we're seeing all over the place, like I see this debate around say soulbound tokens and so on and fees. And you have on the one side, the folks who are soulbound and fees are the future of identity, which I very much disagree with. And you have on the other side things that are only good for financial transactions, which I also completely disagree with. I think the truth is you need to use the right tool for the right problem. I think having sort of a shared state through blockchains on a technical level as the backstop to identify these systems is the way we get out of the moralism that's been identity in web2. But I think it needs to be done super, super deftly. For example, putting user data on-chain is a terrible idea. Even putting encrypted user data on-chain is a terrible idea.
I was going to say, yeah, going back to your first point, the idea of transparency is held to such a high degree in the web3 crypto space because everything on the blockchain is open and transparent. So how do you square away transparency, the blockchain and privacy?
Let me give you a thought experiment. Let's assume you're trying to share your social security number with a dapp for whatever reason. There's sort of three levels at which you could do this transparently. The first is you share it with a dapp - this smart contract. Your SSN goes on chain — encrypted, unencrypted, for all intents and purposes, I don't care because ciphers break over time. And the ciphers we use today are not the ciphers we use in 50 years. One of the great points of blockchains is for them to be long lived, is for them to be robust. And so if this data is going to be on the web forever, you certainly don't want the sort of core underlying data, even encrypted, to be on-chain. So that's that's the first option.
The second option is the data is held off chain, but the actual permissions - who gets to read it - is held on chain. And it's better, certainly, because I can't read off Kathryn's SSN from the chain, but I can know that Katherine is okay with Henri having access to the data. But Bob not having access. And now Bob can come after you and say, what the hell you just gave Henri access and you're not giving the access. The permissions themselves are still yet another part of transparency.
And then this is where zero-knowledge proofs can come in. So a third system, which is a system in which actors are making claims about data and about permissions and access. These claims are coming on chain, but basically these claims don't mean anything unless there's a need for dispute resolution. So this is the same way state channels work generally, which is a lot of activity happens off-chain, but the chain is the end settlement layer on which if as a user you claim, a dapp used my data and used it in a way I didn't allow, you can actually have a proof for it. And on the flip side to that and say, no, no, no, here are the people we gave access to and here's how they accessed it.
So I think the chain is sort of this public state, this verifiable resolution layer for data access and for identity data is really, really important. And it's what allows us to basically build systems that not only can be more trusted, but I think more importantly, are interoperable. Again, I think the point for me here is there are two parts to why blockchains are important. First, it's - let's make sure alpha dapp is not lying to you about what they're doing with your data. The second is alpha dapp may exist for five years. For ten years for your identity in the online world exist forever, or at least for as long as you're alive. And so we need to have a system that can be iterated upon so that once alpha dapp ceases to exist or moves on to do something else, bravo dapp can user identity and Charlie that can use your identity and delta Dapp can use your identity and basically your identity just as you a living thing can evolve through the ancient interactions that you have. And so this is one of the parts where I'm super excited for the potential for interoperability in web3, which means that we're not stuck with, you've chosen to buy into the Facebook ecosystem. Basically, you either use your app or app or you lose all the data that was there. I think I think choice is really the important thing that blockchains get us.
So aside from being able to pass your own data, you also talked about giving permission, which I think are really core to, at least in my mind, what it means to own your own data. Right. So, I think this ties back really nicely, I think, to what you're building at Privvy, right? Which to me means you're building a solution that will enable this kind of data sovereignty. So again, like I said, in my mind data sovereignty is the power to own your own data and decide who you want to give permission to and who you don't. That choice is at least yours. Help me define what data sovereignty means in a broader kind of landscape. And also, how do you think some of these real world use cases can Privvy actually enable?
Yeah. So I would define data sovereignty as probably three things. The first is expressive and revocable permissions, not simple, you know, predicates. And here I think back a lot to say GDPR consent banners. You either agree or you get the hell out of here. And that is not an expressive permission. I think the permission should be a lot more nuanced, which is what data do you want to share? And not if you don't share anything, then it can't work.
The second is data portability. I get to take my data with me wherever I go. I am the source of truth for that data. I think the state of the web today is if I sign up to 19 services, my data exists in 19 different places and I'm only as safe as the worst of these services that's taking on my data. And so it should be it should be flipped. It should be that data belongs with me and I open or close access to various services.
And that gets me to a third point, which is centralized control. And I use the word centralized here, I think on purpose it's a little tongue in cheek, but it should be centralized around me. I should be the person to decide, not some blockchain that I lose control over. And so those are, I think the three key features for me of data sovereignty, which is expressly revocable permissions, data portability and centralized control. And all of that comes together basically to create a world in which as an end user, my data is fluid, follows me throughout the web, but basically acts as I do and with the amount of impermanence that I watch. And so it is the ability for me to control who has access to my data for how long and how much of it at a high level.
Now I think the really tricky thing - and this is the part that strikes me as one that a lot of people who are working on data sovereignty forget - is that it sort of takes two to tango. You have the user who's excited to use a product, but you have the developer who needs to be able to build. And another reason why Apple may have been quite successful is because, say, Swift or Objective-C before it. These are good. XCode is a fair idea in which I can build good apps. And so the question on the flip side is how do you allow developers to build in a data sovereign world? And this is kind of what we're trying to get at, which is to say, how can we help level the playing field so developers can be responsible with user data, can build good products that respect data sovereignty on behalf of the user, but not have to basically juggle all of the different parts that make up data identity. I think it's too much to ask for somebody trying to build a dapp to have to reinvent the identity stack as a whole. And so I think our role is to really sit between the user and the developer and help the developer with the toolkit that allows them to make choices on behalf of what data they need for their app and that keeps the user in the driver's seat so they can do that.
So to answer your question: Privy really is a simple way for developers to take better care of their user data. It's a pretty simple API, has to make calls or get better calls directly from the browser, and it allows the developer to take on sort of off-chain data associated as part of an encrypted by default and end-to-end encrypted data store associated by default to user wallet. Without having that data go on chain.
How does privacy actually work today and how do you see it kind of changing in the long term? And by the way I think a lot of what you're saying is. Actually made me think of a question, which is, do you think that data sovereignty should be a design choice? Do you think it should be kind of on the users to think about this, like are devs solely responsible for making these choices?
It's a really great question. And I think the reality is it's a spectrum, but we should define where the spectrum begins. And I'll give you an example. The version of data sovereignty, quote unquote, that we have today with GDP, which is no data sovereignty at all, is basically a false choice. As a user, you come into our site and you either get a fully broken experience or actually us turning you away, or you agree that we do whatever the hell we want with your data. The way in which, say, privacy policies work today where they can be updated out from under you, you're expected to read through this legalese that is a false form. So clearly that is not the right place where the spectrum begins. I think the spectrum should begin with the version of informed consent. And I think the idea that just by using the product users have consent, it is not is not right. The fact is you don't know what you're agreeing to when you use a product. And so finding the language and building UX such that users can consent to things even if they don't control the underlying use of data, they can verify. How it gets used is sort of on the one hand of the spectrum, and on the other end you might have the user gets to decide every time they give you access to data. And so from a UX perspective this is the difference if I'm about to share an email with that, this is the difference between that having to ask me for email access every time they want to send me an email versus the dot, being able to send me email because I agreed to it once. And obviously based on the piece of data that you're handling, it might or might not be the right solution. Like, say, I shared my SSN with you potentially. I do want you to ask me every time you need to unlock it in order to generate a legal document or something like that. Whereas say with email, I certainly don't want to have to be there and agree to the email like every time you want to send me one. The point of it is that you should be able to reach out to me without me being there. So that's why I don't think there's a one size fits all solution to data sovereignty. But I think it should sit between a place where basically the user is unable to give informed consent. So it should sit in a place where you have transparent controls and the user can trust but verify. And on the other side, to a place where access control is so granular that basically you're sort of throwing the kitchen sink at the problem. This is the part of crypto that kind of annoys me is when a developer says, well, if you're unhappy, you can just go ahead and fork it. It's like, Well, that's it. We're back to the same thing as the GDPR point, which is it's a fake choice. And so if you tell me, well, if you're unhappy as a user, you have all of these settings that you can determine yourself, just run your own node and so on. That's no solution at all. And so I think this has to be based on good UX and these have to be revocable permissions. So those are the two sort of, I think are things without which you can't get the app sovereignty.
You know what this reminds me of? It's not exactly related, but there's this one op ed that I think about a lot. So it's written by Tim Wu. So he was like he was a professor at Columbia and now he's in the White House in charge of technology policy. But he wrote this op ed in The New York Times, I think in 2018 or some maybe a few years ago. But so it's titled The Tyranny of Convenience. And so what he says is basically things are getting easier when I see things that can mean anything. A few washing machines, it could be browsing whatever web pages, it could be clicking the AI except just to use something, right. So he's basically saying as tasks get easier, then like there's this growing expectation that that convenience actually puts on, which is that everything else needs to be easy or to be left behind. And so he kind of argues that we become spoiled by the kind of immediacy, right? We really get what you have and then you get really annoyed, but anything that requires any effort in time. And so basically what he's trying to get at is this like the easier left behind dichotomy, which is what you're saying. It's actually kind of a false choice. And where he's getting on the article is really just embrace the struggle.Sometimes you want to go through the process. So anyway, it kind of reminded me of what you're saying, which is that when you're giving a accept or get out, that's not that's not really a choice. And so that applies to your data, that applies to really any experience, whether it's on the web or not.
But I think this is why it's so exciting that we're working on it, on privacy and web3 and that there are a lot of people working on data sovereignty and web3 speakers. At the end of the day, this is a massive coordination problem. How do we make it that we can build a better system for everybody, even though everybody does not care about it in the same way or to the same level? Beyond that, the technical in the UX choices, there's a coordination problem here. And this is one of the things web3 is really good at because these systems are transparent because they're interoperable.
We actually hopefully get to build sort of a common bedrock for data privacy that is usable by developers. That allows me to build something that makes it easier to take better care of user data. But that also gives the user some floor faster protection. And so I think a lot about this in ecological terms or I think animal cruelty here is the interesting one. There's a lot of efforts around labeling food items in supermarkets. So you can know that your chicken eggs were made by free ranging chickens or that this is grass fed beef that you're buying. And this comes from a coordination effort, I suspect, a lot of. People don't care, but hopefully one. There's a crew of buyers who cares enough to make a purchasing power decision til regulation steps in and also does.
And so I think this is sort of my core idea is that any of these solutions will not be built in isolation. We need to come up with standards. And so to go back to your original question about how Privy works, Privy really takes care of three things - encryption, key management permissions and storage. When you type in a piece of private data in your browser, a call is made to the privacy cam as the key management system. And that key management system is backed either by a secure server that only takes care of keys that we host on your behalf. So this is a hardware security model or by your web3 wallet and depending on which it is those, we generate a new set of keys and data. Your private data is encrypted in the browser using those new sets of keys, so that ciphertext is then sent to the private storage system blocked by the provisioning system. But the point I'm getting at is this entire system is not super modular because our assumption is we can't be the ones to run everything over time. In fact, we shouldn't be. However, the first order of work for us is let's get the interfaces right, let's get the UX right. Because I think if we don't get the UX right in the next two years, the tyranny of convenience will win over. And what we as a whole, I think is going to move to look a hell of a lot more like two, web3 is going to move to a place where developers, because they don't have any other choice, are just dumping data into PostgreSQL and basically web3 becomes a financial application once again. And so I think in order for data ownership, in order for data sovereignty to be a part of web3, we need to fix the UX problems and then we need to decentralize our systems, which is why we built Privvy. We have very modular, but today we run infrastructure so that we can really focus on getting the user experience right, on getting the developer experience right, and then open up the infrastructure so that you can run your own node for the permissions agent that gets access to data so that you can store the data and not necessarily, say, in our cloud provider, but on ipfs or wherever you will. Again, I think it's about the right tool for the right problem, and the right tool is not the same for everybody.
Do you think with the way that Privy works today or in the future, would you have built basically a different business model or a different business at all if this was building for the web2 world versus the web3 world? And how so it is different?
So I think from an actual sort of what we're striving for is to build an API that is the same in both worlds. So I think that's one of the really exciting things for us. And frankly, it's because I think a lot of API tooling is much better. I think I go to a lot of web3 websites and I feel like I have to get a Ph.D. in order to understand how the thing is working. I feel like I have to read reams of docs. People are throwing white papers in my face and I'm like, how, I just want to get something to work. And so that's why we built it with that bar in mind, which is to say that the experience would be indistinguishable, whether using it for whatever, web3. And in that sense, the big difference is going to be where is the infrastructure actually running? For example, are you authenticating with your own system or with octane? What two are you authenticating with something like signing with Ethereum or another blockchain system or in this case, key signing system. And then who's running the infrastructure? Is Privvy running infrastructure on your behalf? On your behalf, like any host its Azure platform. Or actually, are you running it through a network of nodes? So that is probably the first distinction is at an architectural level, there'll be differences on what's happening behind the scenes, even if those don't show up through the interfaces. I think in terms of business model, the second thing actually I want to get to is the fact that users control their own keys and web3 and I really think that is a transformative change. And so I think one of the most important issues in the space right now is the question of custody. I think social recovery, I think in management and better wallet interfaces are so important because if we give up on sort of non-custodial solutions, then a lot of I think the core advantages of web3 go away. So the fact that in web3 users have their own keys that today they're using the custody assets but that they can use the custody data is to me, a really essential part of also the difference here. And this is something we're very interested in, which basically how do we bridge the gap between what to systems, what to off systems and what's real systems to key management? On the business model point, and this is the third and last piece. The reality is, we're open sourcing a lot of our work and we're excited to build basically a part of this infrastructure stack that is going to have to exist around data infrastructure in part because token economics exist to prove the economics exist. I think the reality of the last 30 years of the web is that users will not pay for privacy and users will not pay for their own data management. I think any solution that caters to user ID is basically a paid service where you pay 20 bucks a month and you get privacy is just to me an unsound business model. This is where both the the tyranny of convenience and just force of habit unpalatable to the bottom up user. And so the question then is who should have to pay? Obviously, the answer is the developer, but if I have a data store that is shared among apps A, B and C, if Uniswap and mirror and fractional, I'll have access to my data and I'll use it. Who should be paying for it? Right? Are they paying based on the number of accesses? What if I first signed up and basically built up my data store, my data vault through uniswap. Should Uniswap be rewarded for onboarding me into a sovereign data system? And this is where crypto economics are actually really excellent as we can actually build this sort of massive coordination solution so that data accesses and data rights can actually sort of either be paid out or accrue value to the people who are using the underlying user data. And so as a business, you have your own monetization model, but you're paying for the right to serve that customer. And so this is a major way in which I think a lot of the architecture design behind privacy has been breached is in knowing that that is the end goal we're going towards and that I think that is how a data solving system should work.
Do you think that all or most builders and also users within web3 share this ethos of having that choice, having data sovereignty? And does that make them the ideal kind of users for you? In the beginning just because I think people are going to web3 because they're a little bit fed up with the web2 models and how you feel like you don't really have, you know, a choice like we talked about and how all the power is centralized within one platform. And so as a user, you don't have a choice. Do you think that makes it a good kind of ground to build preventive build towards that? And do you think web2 kind of companies or web2 users or web2 platforms can eventually also get onboarded?
Yeah. I mean, my contention is that the distinction is going to go away. And so in the future there will be no what your web3 the value question of how the infrastructure and how the stack is built and shared state systems, which is a geeky way of saying blockchains will be a part of major infrastructure. And so 30 years on, I suspect that is exactly how things will end and what you companies will have to learn from it, because web2 companies will have integrated the best parts of web3 infrastructure within their stock and vice versa. To get a little bit lower level, I think all people in web3 don't share that ethos and I think that's completely fine. I've seen a number of builders in web3 were really here because they get to build novel experiences where they get to monetize in new ways and they're attracted by basically some of the freedoms that web3 gives them in terms of tooling, in terms of UX, in terms of all of that. And this is where, frankly, regulation is our friend in that the regulatory landscape in trying to protect users has built the grounds for developers saying, I really don't want to touch user data, please get me out of this custodial nightmare. And so whether you're doing it because you have great ideology and you're like I want to empower users or whether you're doing it because you don't want to have to deal with the legal overhead of owning the user data that leads to the same system, which is one in which the developer only has access to the data if the user still wishes. And I think the real question is going to come down to what users are willing to put up with in terms of friction and in terms of UX. I have no doubt basically that the users will determine how far the UX can evolve towards user empowerment, which is why we have to unlock really delightful UX around data management, because without that developers will not sacrifice basically their business, their products, their dream in the name of building something that is like pure from an eco standpoint. So that's kind of my take on it, which is thankfully as incentives have aligned in a way that's really unique in web3 today, which is why I think it's such an important time to be working on this issue, and I think the next couple of years are going to be key. But I think one point I just want to double down on is I don't think this is only going to come through pain or fear. This is not just I worry about the FCC or the CFTC coming after me. And so I don't want to know about any of my users. I think this is really about unlocking new experiences. I think what you can do with sovereign data systems as a user, this is the most boring example. But as a user, the fact that I can move from, say, New York to California and update my address once and it's updated in all the systems I use online, having a digital data passport that I can travel with, like the sky's the limit with what you unlock if you have built secure data systems that respect user wishes on Top End decentralized rails. So I really am excited for us to get through this, this first phase of actually jump starting the sovereign data machine into I think a next phase, which is really I think a far more human centric sort of web experience that we get thanks to this.
As our interactions with the world and other people become increasingly online. I really like this idea of a digital passport, right? And a passport is mine and I own it. And if I want to go somewhere that's my choice to use it. And so kind of taking that comparison, our data where it goes, who gets a controller, who gets access to it, where I get to take it just becomes ever more important. And I'm so glad you're building towards this vision. Henri, thank you so much for coming on the show, sharing with us the vision and talking to us about a future where the choices are real. We avoid the tyranny of convenience, and we have true data sovereignty.
Thank you so much. Thanks for having me.
Thank you, everyone, for tuning into episode three of Cross-Chain Examination. If you haven't already, definitely go check out our first two episodes. Please like and subscribe wherever you're getting this podcast. So whether it's on YouTube, Apple Podcasts, Spotify and the like. Leave me any thoughts, questions, suggestions for future episodes. Thank you again and I'll see you next week.