“FloC brings the model to the data instead of sending the data to the centralized model”

Overview 

Federated Learning is a relatively new and evolving machine learning technique that decentralizes the training of data from one central machine/ data center to multiple devices, including mobile phones. Federated Learning of Cohorts, or FloC for short, is a form of web tracking enabled through Federated Learning in which individuals are grouped into “cohorts” based on similar browsing behavior. 

Machine learning is a branch of Artificial Intelligence and computer science that leverages data and algorithms to make computers mimic human learning and decision making. Federated Learning takes advantage of edge computing principles, bringing computation and data storage closer to where it is needed. This principle allows for reduced response times, bandwidth conservation, and personalization, amongst other benefits. This article seeks to provide an easy-to-digest overview of Federated Learning of Cohorts for business professionals and others interested in the new technology. 

What is Federated Learning

As with any machine learning task, data is necessary to train and retrain algorithmic models to improve on them, identify patterns, make decisions, and bring enhanced features to a system.

But, as we know, data is everywhere and sometimes siloed in different locations, making it hard to access, especially within a business organization that may produce a multitude of data across various sources. Regarding data, other considerations are privacy issues, regulatory requirements, and lack of resources to bring segmented data together from a single source. The typical process of a machine learning project seeks to establish data pipelines to a central location to then be cleaned, pre-processed, and trained. Federated Learning aims to improve this process by bringing training models closer to where data is located. In other words, Federated learning brings training models to the source of data-generating entities. As Li et al. put it, Federated Learning “explores training statistical models directly on remote devices.”

For example, mobile phones and other devices download the training model locally, contribute to the machine learning algorithm with local data, then generate a summary of changes made to the machine learning model as a minor “update.” This “update” is sent to the cloud, where other “updates” are amassed from various sources, ultimately resulting in an improved shared and distributed machine learning algorithm. It is important to note a secure and encrypted aspect of data sharing between the Federated Learning enabled device and the Cloud service performing the aggregation and updating the new consensus of the machine learning algorithm. The encrypted and private “updates” mentioned above only occur when it doesn’t negatively impact the user experience on target devices. 

Why is Federated Learning important?

Federated Learning is essential for numerous reasons; namely, it opens practitioners up to a new paradigm shift in how machine learning can be implemented. These new techniques pave the way for decentralized AI and other available machine learning models, enabling new tools to tackle scaled machine learning problem sets. More specifically, FloC enhances the machine learning process through “faster deployment and testing of smarter models, lower latency, and less power consumption, all while ensuring privacy (Bhattacharya, 2019)”. 

Federated Learning also enables the target device to immediately use and take advantage of the evolving machine learning algorithm that the device itself is helping to improve. For example, according to the Google AI blog, “the improved model on your phone can also be used immediately, powering experiences personalized by the way you use your phone (McMahan & Ramage, 2017)”.  

Federated Learning of Cohorts Powered by Federated Learning Algorithms. 

It is important to remember that Federated Learning of Cohorts is powered through Federated Learning (Machine Learning) models and aims to improve web advertising and phase out third-party cookies (metadata used to track users across different websites). FloC is a new protocol developed by Google as a part of its “Privacy Sandbox” initiative, a long-term feat aiming to eliminate third-party cookies, potentially uprooting a long-standing digital advertising ecosystem. More specifically, Google’s “privacy sandbox” seeks to transform digital advertising through “a secure environment for personalization that also protects user privacy (Schuh, 2019)”. Online privacy is the core topic of the criticism surrounding the design of FloC protocols and Federated Learning techniques. Currently, 3rd-party cookies offer little protection against privacy because developers and ad agencies can extrapolate critical identifiers using “fingerprinting” techniques. Schuh notes, “with fingerprinting, developers have found ways to use tiny bits of information that vary between users, such as what device they have or what fonts they have installed to generate a unique identifier which can then be used to match a user across websites (Schuh, 2019)”. 

Although this model paves the way for exciting engineering and user experience use cases, it does come with its criticisms. 

Who should care about Federated Learning of Cohorts? 

The ability to share behavioral labels across the web in a manner that enables ‘learning’ of unique user behavior in an automated and algorithmically governed fashion has the potential to create a future where every interaction online can be leveraged to manipulate you.

Privacy Harming and Criticism of Federated Learning of Cohorts

The Brave organization (Brave Browser) and the Electronic Frontier Foundation (EFF) are prominent advocates against Google’s FloC and “Privacy Sandbox” initiatives. Brave states that “FLoC shares information about your browsing behavior with sites and advertisers that otherwise wouldn’t have access to that information (Snyder & Eich, 2021).” Although your browsing information is anonymized through a grouped cohort of users with similar interests and browsing habits, FloC protocols do not prevent sites within your intra-browsing environment from accessing unique data about you. More simply put, your browsing data is shared with website B after visiting website A, which allows website C to learn about your behavior and unique interests that would not have been otherwise shared. Here is a brief excerpt from the Brave foundation: 

“For example, you may have an existing account with Walgreens, possibly to fill prescriptions. Walgreens necessarily knows who you are. FLoC tells Walgreens things that Walgreens has no business knowing about you (not a pseudonymous you or a cohort including you, but your Walgreens login identifies you) based on your browsing behavior.

Chrome telling Walgreens (and Twitter, and GitHub, and Facebook, and any other site you have an account with) is unquestionably harming your privacy by telling sites information about you that they otherwise wouldn’t have, the information you didn’t decide to share with those sites, and information that is likely unrelated to why you chose to visit those sites where you do login (Snyder & Eich, 2021).” 

The Electronic Frontier Foundation weighs in on this topic by highlighting how FloC protocols will “avoid the privacy risks of third-party cookies, but it will create new ones in the process (Cyphers, 2021).” The ability to share behavioral labels across the web in a manner that enables “learning” of unique user behavior in an automated and algorithmically governed fashion has the potential to create a future where every interaction online can be leveraged to manipulate you. 

Federated Learning of Cohorts is a tool leveraged in a new digital frontier where emerging technologies can better user experience and offer great value. Although Federated Learning can bring about new capacities to our devices by enabling a new paradigm of machine learning and AI techniques, it also paves the way for designing tools that can be used maliciously. In this instance, the design of FloC protocols may not initially be architected with malignant intent, but do present futures where these technologies can produce unintended consequences. The EFF presents the following point of view on the potential of the future use of FloC protocols: 

“In one, users get to decide what information to share with each site they choose to interact with. No one needs to worry that their past browsing will be held against them—or leveraged to manipulate them—when they next open a tab. In the other, each user’s behavior follows them from site to site as a label, inscrutable at a glance but rich with meaning to those in the know. Their recent history, distilled into a few bits, is “democratized” and shared with dozens of nameless actors that take part in the service of each web page. Users begin every interaction with a confession: here’s what I’ve been up to this week, please treat me accordingly.”

Cyphers, 2021

In either instance, it is imperative for technologists, designers, and large corporations to have diverse teams that enable many considerations as emerging technologies are being built. Federated Learning is an exciting technology and can provide good in this world, but there may be unintended consequences of its use (as highlighted through Federated Learning of Cohort techniques) as with any other technology.

FloC model diagram depects the 'model to the data instead of sending the data to the centralized model'
FloC Model – Diagram

Bibliography

Bhattacharya, D. S. (2019, February 2). The New Dawn of AI: Federated Learning. Medium. https://towardsdatascience.com/the-new-dawn-of-ai-federated-learning-8ccd9ed7fc3a.

Cyphers, B. (2021, April 9). Google’s FLoC Is a Terrible Idea. Electronic Frontier Foundation. https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea.

Marboer. (2021, April 6). Federated Learning: Introduction to Federated Learning. Analytics Vidhya. https://www.analyticsvidhya.com/blog/2021/04/federated-learning-for-beginners/.

McMahan, B., & Ramage, D. (2017, April 6). Federated Learning: Collaborative Machine Learning without Centralized Training Data. Google AI Blog. https://ai.googleblog.com/2017/04/federated-learning-collaborative.html.

Schuh, J. (2019, August 22). Building a more private web. Google. https://www.blog.google/products/chrome/building-a-more-private-web/.

Snyder, P., & Eich, B. (2021, April 20). Why Brave Disables FLoC. Brave Browser. https://brave.com/why-brave-disables-floc/. 

Useful Links: 

https://hackernoon.com/a-beginners-guide-to-federated-learning-b29e29ba65cf

https://ai.googleblog.com/2017/04/federated-learning-collaborative.html

https://hackernoon.com/federated-learning-a-decentralized-form-of-machine-learning-nr4635rg

https://federated.withgoogle.com/

https://towardsdatascience.com/the-new-dawn-of-ai-federated-learning-8ccd9ed7fc3a

https://www.ibm.com/blogs/research/2020/08/ibm-federated-learning-machine-learning-where-the-data-is/

https://towardsdatascience.com/the-new-dawn-of-ai-federated-learning-8ccd9ed7fc3a

https://www.techradar.com/news/theres-more-to-googles-floc-than-meets-the-eye

https://www.eff.org/deeplinks/2021/03/googles-floc-terrible-idea

https://techcrunch.com/2021/03/24/google-isnt-testing-flocs-in-europe-yet/ 

https://techcrunch.com/2019/08/22/google-proposes-new-privacy-and-anti-fingerprinting-controls-for-the-web/

https://brave.com/why-brave-disables-floc/

Credits: 

Brief prepared by Richard Martinez, 3rd-year Ph.D. student, based on reference materials from Ali Jaffar, founder of Key Medium and Huge Thinking.