Security, privacy, and cloud: 3 examples of why research matters to IT – The Enterprisers Project

When youre busy running around putting out fires, its easy to dismiss research as something that may be interesting for university professors and their students but doesnt exactly merit bandwidth from a busy IT professional. While its almost certainly true that it shouldnt be a primary focus, I hope to convince you that it deserves at least a little bit of your attention.

Previously,Ive written about why quantum computingin general and quantum-resistant cryptography in particular, even in their early stages, are of more than academic interest to anyone charting the future course of a technology-focused organization. Here, Im going to take you through a few of the forward-looking topics covered in the newestRed Hat Research Quarterly issueand connect them to challenges that IT professionals face today.

[ How can automation free up more staff time for innovation? Get the free eBook:Managing IT with Automation. ]

The cryptography that underpins much of software security is critical and is certainly the subject of a great deal of ongoing research. The issue even contains an article by Vojtch Polek that describes research into transforming easy to remember passwords into secure cryptographic keys using derivation functions. However, of perhaps more immediate interest to IT pros is Martin Ukrops usability research.

For the past few years, Ukrop, a PhD candidate at the Centre for Research on Cryptography and Security at Masaryk University in the Czech Republic, has conducted experiments at theDevConf.czopen source event. These experiments revolve around X.509 certificates, their generation, validation, and understanding. Ukrop explains this focus: Nowadays, most developers need secure network connections somewhere in their products. Today, that mostly means using TLS [Transport Layer Security], which, in turn, most likely means validating the authenticity of the server by validating its certificate. Furthermore, it turns out that understanding all the various quirks and corners of certificate validation is far from straightforward. OpenSSL, one of the most widely used libraries for TLS, has almost 80 distinct error states related only to certificate validation.

About 20 percent of the participants considered both a self-signed certificate and one with violated name constraints as "looking OK"or better.

One experiment, conducted in 2018, which would likely be relevant to many developers, involved investigating how much developers trust flawed TLS certificates. They were presented with certificate validation errors, asked to investigate the issue, assess the connections trustworthiness, and describe the problem in their own words. Ukrops conclusion was that some certificate cases were overtrusted. For example, about 20 percent of the participants considered both a self-signed certificate and one with violated name constraints as looking OK or better; most security professionals would disagree.

Ukrops work aims to improve security usability for developers; the work in progress can be found athttps://x509errors.org. However, in the meantime it suggests that training developers to better deal with certain types of security errors might have a good payoff.

Another area of interest to IT leaders,which Ive written about previously, relates to the complications associated with balancing data sharing needs with privacy protection. That was the topic of an interview that Sherard Griffin, a director at Red Hat in the AI Center of Excellence conducted with James Honaker and Merc Crosas of Harvard University. Honaker is a researcher at the Harvard John A. Paulson School of Engineering and Applied Sciences, while Crosas is Chief Data Science and Technology Officer at Harvards Institute for Quantitative Social Science.

Griffin lays out a common challenge faced by many organizations including his own. The datasets we needed from a partner to create certain machine learning models had to have a fair amount of information. Unfortunately, the vendor had challenges sharing that data, because it had sensitive information in it. In Harvards case, it is a challenge they face with Dataverse, which Crosas describes as a software platform enabling us to build a real data repository to share research datasets. The emphasis is on publishing datasets associated with research that is already published. Another use of the platform is to create datasets that could be useful for research and making them available more openly to our research communities.

Differential privacy works by adding a small amount of noise sufficient to drown out the contribution of any one individual in the dataset.

Harvards approach to guaranteeing individual privacy when a shared dataset like Dataverse is exposed to researchers: Use differential privacy. Its a relatively new technique which came out of work primarily by Cythia Dwork in 2006 but is starting to see widespread use, including by the US Census Bureau in 2020. So its certainly not of just academic interest at this point.

Differential privacy works by adding a small amount of noise sufficient to drown out the contribution of any one individual in the dataset. Making it harder to tease out individual data points from an aggregated set isnt a new thing of course. The difference is that differential privacy approaches privacy guarantees in a mathematically rigorous way.

As Honaker puts it: The point is to balance that noise exactly [between making the data useless and exposing individual data points]; thats why the ability to reason formally about these algorithms is so important. Theres a tuning parameter called Epsilon. If an adversary, for example, has infinite computational power, knows algorithmic tricks that havent even been discovered yet, Epsilon tells you the worst case leakage of information from a query. Some of the ongoing research in this area involves the tuning of that parameter and dealing with cases where that parameter can get used up by repeated queries.

[ Check out our primer on 10 key artificial intelligence terms for IT and business leaders:Cheat sheet: AI glossary. ]

The final topic that Ill touch on here is AIOps, which Red Hats Marcel Hild researches in the Office of the CTO. This emerging area recognizes that open sourcecode is only a part of whats needed to implement and operate services based on that code. Hild argues that: We need to open up what it takes to stand up and operate a production-grade cloud. This must not only include architecture documents, installation, and configuration files, but all the data that is being produced in that procedure: metrics, logs, and tickets. Youve probably heard the AI mantra that data is the new gold multiple times, and there is some deep truth about it. Software is no longer the differentiating factor: its the data.

Hild acknowledges that the term AIOps can be a bit nebulous. But he sees it as meaning to augment IT operations with the tools of AI, which can happen on all levels, starting with data exploration. If a DevOps person uses a Jupyter notebook to cluster some metrics, I would call it an AIOps technique. He adds that the road to the self-driving cluster is paved with a lot of data labeled data.

Fittingly, much of this research is itself taking place in the open, such as with the evolving open cloud community at theMass Open Cloud. All discussions happen in public meetings and, even better, are tracked in a Git repository, so we can involve all parties early in the process and trace back how we came to a certain decision. Thats key, since the decision process is as important as the final outcome. All operational data will be accessible, and it will be easy to run a workload there and to get access to backend data, writes Hild.

To read more about these examples, read back issues, orsign up for a complimentary subscriptionto Red Hat Research Quarterly (print or digital).

Original post:
Security, privacy, and cloud: 3 examples of why research matters to IT - The Enterprisers Project

Related Posts
This entry was posted in $1$s. Bookmark the permalink.