When considering data privacy and protections, there is no data more important than personal data, whether that’s medical, financial, or even social. The discussions around access to our data, or even our metadata, becomes about who knows what, and if my personal data is safe. Today’s announcement between Intel, Microsoft, and DARPA, is a program designed around keeping information safe and encrypted, but still using that data to build better models or provide better statistical analysis without disclosing the actual data. It’s called Fully Homomorphic Encryption, but it is so computationally intense that the concept is almost useless in practice. This program between the three companies is a driver to provide IP and silicon to accelerate the compute, enabling a more secure environment for collaborative data analysis.

Mind Your Data

Data protection is one of the most important aspects to the future of computing. The volume of personal data is continually growing, as well as the value of that data, and the number of legal protections required. This makes any processing of personal, private, and confidential data difficult, often resulting in dedicated data silos, because any processing requires data transfer coupled with encryption/decryption, involving trust that isn’t always possible. All it takes is for one key in the chain to be lost or leaked, and the dataset is compromised.

There is a way around this, known as Fully Homomorphic Encryption (FHE). FHE enables the ability to take encrypted data, transfer it to where it needs to go, perform calculations on it, and get results without ever knowing the exact underlying dataset.

Take for example, analyzing medical data records: if a researcher needs to process a specific data-set for some analysis, the traditional method would be to encrypt the data, send the data, decrypt the data, and process it – but giving the researcher access to the specifics in the records might not be legal or face regulatory challenges. With FHE, that researcher can take the encrypted data, perform the analysis and get a result, without ever knowing any specifics of the dataset. This might involve combined statistical analysis of a population over multiple encrypted datasets, or taking those encrypted datasets and using them as additional inputs to train machine learning algorithms, enhancing the accuracy through having more data. Of course, the researcher has to have trust that the data given is complete and genuine, however that is arguably a different topic than enabling compute on encrypted data.

One of the issues as to why this matters is because the best insights from data come from the largest datasets. This includes being able to train a neural network, and the best neural networks are coming against issues of not having enough data, or are facing regulatory hurdles when it comes to the sensitive nature of that data. This is why Fully Homomorphic Encryption, the ability to analyze data without knowing its contents, matters.

Fully Homomorphic Encryption, as a concept, has been around for several decades, however the concept has only been realized in the last 20 years or so. A number of partial homomorphic encryption schemes were presented in that initial timeframe, and since 2010 several PHE/FHE designs able to process basic operations on encrypted data or cyphertexts have been developed with a number of libraries developed with industry standards. Some of these are open source. A lot of these methods are computationally complex for obvious reasons due to dealing with encrypted data, although efforts are being made with SIMD-like packing and other features to accelerate processing. Even though FHE schemes are being accelerated, this isn’t the same as decryption, because the math doesn’t decrypt the data - because the data is always in an encrypted state, it can (arguably) be used by untrusted third parties as the underlying information is never exposed. (One might argue that a sufficient dataset could reveal more than intended despite being encrypted.)

Today’s Announcement: Custom Silicon for FHE

When measuring the performance of FHE compute, the result is compared to the same analysis against the plain text version of the data. Because of the computational complexity of FHE compute, current compute methods are substantially slower. Encryption methods to enable FHE can increase the size of the data by 100-1000x, and then compute on that data is 10000x to 1 million times slower than conventional compute. That means that one second of compute on the raw data can take from 3 hours to 12 days.

So whether that means combining hospital medical records over a state, or customizing a personal service using personal metadata gathered on a user’s smartphone, FHE at that scale is no longer a viable solution. Enter the DARPA DPRIVE program.

  • DARPA: Defense Advanced Research Projects Agency
  • DPRIVE: Data Protection in Virtual Environments

Intel has announced that as part of the DPRIVE program, it has signed an agreement with DARPA to develop custom IP leading to silicon to enable faster FHE in the cloud, specifically with Microsoft on both Azure and JEDI cloud, initially with the US government. As part of this multi-year project, expertise from Intel Labs, Intel’s Design Engineering, and Intel’s Data Platforms Group will come together to create a dedicated ASIC to reduce the computational overhead of FHE over existing CPU-based methods. The press release states that the target is to reduce processing time by five orders of magnitude from current methods, reducing compute times from days to minutes.

Intel already has a foot in the door when it comes to FHE, having a research team inside Intel Labs dedicated to the issue. This has primarily been on the side of software, standards, and regulatory hurdles, but will now also move into hardware design, cloud software stacks, and collaborative deployment within Azure and JEDI cloud for the US government. Other highlighted target markets include healthcare, insurance, and finance.

During the Intel Labs day in December 2020, Intel detailed some of the direction it was already going in with this work, along with standards and development to parallel traditional encryption but at an international scale given the additional regulatory hurdles. Microsoft will now become part of that discussion with the DPRIVE program, along with Intel’s continued investments at the academic level.

Aside from the ‘five orders of magnitude’ element, today’s announcement doesn’t go beyond that in creating definitive goals, nor does it present a time-frame, instead saying this is a ‘multi-year’ agreement. It will be interesting to see how much Intel or their academic affiliations discuss on the topic beyond today, beyond the work standardization.

Related Reading

 

 

Comments Locked

31 Comments

View All Comments

  • JayBird50 - Tuesday, March 30, 2021 - link

    If i understand the article HE allow/supports actions on "encrypted" data. If the data can be acted upon then the data has been "conditioned" in a predictable/repeatable manner. This is not really encryption, whose results are quantifiable in strength. Is there any data available on the strength of the HE algorithm/process? Also if the process is predictable/actionable then the key to the process would be fixed so more like a stenographic approach.

Log in

Don't have an account? Sign up now