by Sina Pilehchiha, Concordia University
Trail of Bits has manually curated a wealth of datayears of security assessment reportsand now were exploring how to use this data to make the smart contract auditing process more efficient with Slither-simil.
Based on accumulated knowledge embedded in previous audits, we set out to detect similar vulnerable code snippets in new clients codebases. Specifically, we explored machine learning (ML) approaches to automatically improve on the performance of Slither, our static analyzer for Solidity, and make life a bit easier for both auditors and clients.
Currently, human auditors with expert knowledge of Solidity and its security nuances scan and assess Solidity source code to discover vulnerabilities and potential threats at different granularity levels. In our experiment, we explored how much we could automate security assessments to:
Slither-simil, the statistical addition to Slither, is a code similarity measurement tool that uses state-of-the-art machine learning to detect similar Solidity functions. When it began as an experiment last year under the codename crytic-pred, it was used to vectorize Solidity source code snippets and measure the similarity between them. This year, were taking it to the next level and applying it directly to vulnerable code.
Slither-simil currently uses its own representation of Solidity code, SlithIR (Slither Intermediate Representation), to encode Solidity snippets at the granularity level of functions. We thought function-level analysis was a good place to start our research since its not too coarse (like the file level) and not too detailed (like the statement or line level.)
Figure 1: A high-level view of the process workflow of Slither-simil.
In the process workflow of Slither-simil, we first manually collected vulnerabilities from the previous archived security assessments and transferred them to a vulnerability database. Note that these are the vulnerabilities auditors had to find with no automation.
After that, we compiled previous clients codebases and matched the functions they contained with our vulnerability database via an automated function extraction and normalization script. By the end of this process, our vulnerabilities were normalized SlithIR tokens as input to our ML system.
Heres how we used Slither to transform a Solidity function to the intermediate representation SlithIR, then further tokenized and normalized it to be an input to Slither-simil:
Figure 2: A complete Solidity function from the contract TurtleToken.sol.
Figure 3: The same function with its SlithIR expressions printed out.
First, we converted every statement or expression into its SlithIR correspondent, then tokenized the SlithIR sub-expressions and further normalized them so more similar matches would occur despite superficial differences between the tokens of this function and the vulnerability database.
Figure 4: Normalized SlithIR tokens of the previous expressions.
After obtaining the final form of token representations for this function, we compared its structure to that of the vulnerable functions in our vulnerability database. Due to the modularity of Slither-simil, we used various ML architectures to measure the similarity between any number of functions.
Figure 5: Using Slither-simil to test a function from a smart contract with an array of other Solidity contracts.
Lets take a look at the function transferFrom from the ETQuality.sol smart contract to see how its structure resembled our query function:
Figure 6: Function transferFrom from the ETQuality.sol smart contract.
Comparing the statements in the two functions, we can easily see that they both contain, in the same order, a binary comparison operation (>= and <=), the same type of operand comparison, and another similar assignment operation with an internal call statement and an instance of returning a true value.
As the similarity score goes lower towards 0, these sorts of structural similarities are observed less often and in the other direction; the two functions become more identical, so the two functions with a similarity score of 1.0 are identical to each other.
Research on automatic vulnerability discovery in Solidity has taken off in the past two years, and tools like Vulcan and SmartEmbed, which use ML approaches to discovering vulnerabilities in smart contracts, are showing promising results.
However, all the current related approaches focus on vulnerabilities already detectable by static analyzers like Slither and Mythril, while our experiment focused on the vulnerabilities these tools were not able to identifyspecifically, those undetected by Slither.
Much of the academic research of the past five years has focused on taking ML concepts (usually from the field of natural language processing) and using them in a development or code analysis context, typically referred to as code intelligence. Based on previous, related work in this research area, we aim to bridge the semantic gap between the performance of a human auditor and an ML detection system to discover vulnerabilities, thus complementing the work of Trail of Bits human auditors with automated approaches (i.e., Machine Programming, or MP).
We still face the challenge of data scarcity concerning the scale of smart contracts available for analysis and the frequency of interesting vulnerabilities appearing in them. We can focus on the ML model because its sexy but it doesnt do much good for us in the case of Solidity where even the language itself is very young and we need to tread carefully in how we treat the amount of data we have at our disposal.
Archiving previous client data was a job in itself since we had to deal with the different solc versions to compile each project separately. For someone with limited experience in that area this was a challenge, and I learned a lot along the way. (The most important takeaway of my summer internship is that if youre doing machine learning, you will not realize how major a bottleneck the data collection and cleaning phases are unless you have to do them.)
Figure 7: Distribution of 89 vulnerabilities found among 10 security assessments.
The pie chart shows how 89 vulnerabilities were distributed among the 10 client security assessments we surveyed. We documented both the notable vulnerabilities and those that were not discoverable by Slither.
This past summer we resumed the development of Slither-simil and SlithIR with two goals in mind:
We implemented the baseline text-based model with FastText to be compared with an improved model with a tangibly significant difference in results; e.g., one not working on software complexity metrics, but focusing solely on graph-based models, as they are the most promising ones right now.
For this, we have proposed a slew of techniques to try out with the Solidity language at the highest abstraction level, namely, source code.
To develop ML models, we considered both supervised and unsupervised learning methods. First, we developed a baseline unsupervised model based on tokenizing source code functions and embedding them in a Euclidean space (Figure 8) to measure and quantify the distance (i.e., dissimilarity) between different tokens. Since functions are constituted from tokens, we just added up the differences to get the (dis)similarity between any two different snippets of any size.
The diagram below shows the SlithIR tokens from a set of training Solidity data spherized in a three-dimensional Euclidean space, with similar tokens closer to each other in vector distance. Each purple dot shows one token.
Figure 8: Embedding space containing SlithIR tokens from a set of training Solidity data
We are currently developing a proprietary database consisting of our previous clients and their publicly available vulnerable smart contracts, and references in papers and other audits. Together theyll form one unified comprehensive database of Solidity vulnerabilities for queries, later training, and testing newer models.
Were also working on other unsupervised and supervised models, using data labeled by static analyzers like Slither and Mythril. Were examining deep learning models that have much more expressivity we can model source code withspecifically, graph-based models, utilizing abstract syntax trees and control flow graphs.
And were looking forward to checking out Slither-simils performance on new audit tasks to see how it improves our assurance teams productivity (e.g., in triaging and finding the low-hanging fruit more quickly). Were also going to test it on Mainnet when it gets a bit more mature and automatically scalable.
You can try Slither-simil now on this Github PR. For end users, its the simplest CLI tool available:
Slither-simil is a powerful tool with potential to measure the similarity between function snippets of any size written in Solidity. We are continuing to develop it, and based on current results and recent related research, we hope to see impactful real-world results before the end of the year.
Finally, Id like to thank my supervisors Gustavo, Michael, Josselin, Stefan, Dan, and everyone else at Trail of Bits, who made this the most extraordinary internship experience Ive ever had.
Recent Articles By Author
*** This is a Security Bloggers Network syndicated blog from Trail of Bits Blog authored by Nol Ponthieux. Read the original post at: https://blog.trailofbits.com/2020/10/23/efficient-audits-with-machine-learning-and-slither-simil/
Visit link:
Efficient audits with machine learning and Slither-simil - Security Boulevard
- Are We Overly Infatuated With Deep Learning? - Forbes [Last Updated On: August 18th, 2024] [Originally Added On: December 28th, 2019]
- CMSWire's Top 10 AI and Machine Learning Articles of 2019 - CMSWire [Last Updated On: August 18th, 2024] [Originally Added On: December 28th, 2019]
- Can machine learning take over the role of investors? - TechHQ [Last Updated On: August 18th, 2024] [Originally Added On: December 28th, 2019]
- Pear Therapeutics Expands Pipeline with Machine Learning, Digital Therapeutic and Digital Biomarker Technologies - Business Wire [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- Dell's Latitude 9510 shakes up corporate laptops with 5G, machine learning, and thin bezels - PCWorld [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- Limits of machine learning - Deccan Herald [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- Forget Machine Learning, Constraint Solvers are What the Enterprise Needs - - RTInsights [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- Tiny Machine Learning On The Attiny85 - Hackaday [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- Finally, a good use for AI: Machine-learning tool guesstimates how well your code will run on a CPU core - The Register [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- How Will Your Hotel Property Use Machine Learning in 2020 and Beyond? | - Hotel Technology News [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- Technology Trends to Keep an Eye on in 2020 - Built In Chicago [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- AI and machine learning trends to look toward in 2020 - Healthcare IT News [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- The 4 Hottest Trends in Data Science for 2020 - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- The Problem with Hiring Algorithms - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- Going Beyond Machine Learning To Machine Reasoning - Forbes [Last Updated On: August 18th, 2024] [Originally Added On: January 11th, 2020]
- Doctor's Hospital focused on incorporation of AI and machine learning - EyeWitness News [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Being human in the age of Artificial Intelligence - Deccan Herald [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Raleys Drive To Be Different Gets an Assist From Machine Learning - Winsight Grocery Business [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Break into the field of AI and Machine Learning with the help of this training - Boing Boing [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- BlackBerry combines AI and machine learning to create connected fleet security solution - Fleet Owner [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- What is the role of machine learning in industry? - Engineer Live [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Seton Hall Announces New Courses in Text Mining and Machine Learning - Seton Hall University News & Events [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Christiana Care offers tips to 'personalize the black box' of machine learning - Healthcare IT News [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Leveraging AI and Machine Learning to Advance Interoperability in Healthcare - - HIT Consultant [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Essential AI & Machine Learning Certification Training Bundle Is Available For A Limited Time 93% Discount Offer Avail Now - Wccftech [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Educate Yourself on Machine Learning at this Las Vegas Event - Small Business Trends [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- 2020: The year of seeing clearly on AI and machine learning - ZDNet [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- How machine learning and automation can modernize the network edge - SiliconANGLE [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Five Reasons to Go to Machine Learning Week 2020 - Machine Learning Times - machine learning & data science news - The Predictive Analytics Times [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Don't want a robot stealing your job? Take a course on AI and machine learning. - Mashable [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Adventures With Artificial Intelligence and Machine Learning - Toolbox [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Optimising Utilisation Forecasting with AI and Machine Learning - Gigabit Magazine - Technology News, Magazine and Website [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Machine Learning: Higher Performance Analytics for Lower ... [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Machine Learning Definition [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Machine Learning Market Size Worth $96.7 Billion by 2025 ... [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Difference between AI, Machine Learning and Deep Learning [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Machine Learning in Human Resources Applications and ... [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Pricing - Machine Learning | Microsoft Azure [Last Updated On: August 18th, 2024] [Originally Added On: January 19th, 2020]
- Looking at the most significant benefits of machine learning for software testing - The Burn-In [Last Updated On: August 18th, 2024] [Originally Added On: January 22nd, 2020]
- New York Institute of Finance and Google Cloud Launch A Machine Learning for Trading Specialization on Coursera - PR Web [Last Updated On: August 18th, 2024] [Originally Added On: January 22nd, 2020]
- Uncover the Possibilities of AI and Machine Learning With This Bundle - Interesting Engineering [Last Updated On: August 18th, 2024] [Originally Added On: January 22nd, 2020]
- Red Hat Survey Shows Hybrid Cloud, AI and Machine Learning are the Focus of Enterprises - Computer Business Review [Last Updated On: August 18th, 2024] [Originally Added On: January 22nd, 2020]
- Machine learning - Wikipedia [Last Updated On: August 18th, 2024] [Originally Added On: January 22nd, 2020]
- Vectorspace AI Datasets are Now Available to Power Machine Learning (ML) and Artificial Intelligence (AI) Systems in Collaboration with Elastic -... [Last Updated On: August 18th, 2024] [Originally Added On: January 22nd, 2020]
- Learning that Targets Millennial and Generation Z - HR Exchange Network [Last Updated On: August 18th, 2024] [Originally Added On: January 23rd, 2020]
- Machine learning and eco-consciousness key business trends in 2020 - Finfeed [Last Updated On: August 18th, 2024] [Originally Added On: January 24th, 2020]
- Jenkins Creator Launches Startup To Speed Software Testing with Machine Learning -- ADTmag - ADT Magazine [Last Updated On: August 18th, 2024] [Originally Added On: January 24th, 2020]
- Research report investigates the Global Machine Learning In Finance Market 2019-2025 - WhaTech Technology and Markets News [Last Updated On: August 18th, 2024] [Originally Added On: January 25th, 2020]
- Expert: Don't overlook security in rush to adopt AI - The Winchester Star [Last Updated On: August 18th, 2024] [Originally Added On: January 25th, 2020]
- Federated machine learning is coming - here's the questions we should be asking - Diginomica [Last Updated On: August 18th, 2024] [Originally Added On: January 25th, 2020]
- I Know Some Algorithms Are Biased--because I Created One - Scientific American [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- Iguazio Deployed by Payoneer to Prevent Fraud with Real-time Machine Learning - Business Wire [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- Want To Be AI-First? You Need To Be Data-First. - Forbes [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- How Machine Learning Will Lead to Better Maps - Popular Mechanics [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- Technologies of the future, but where are AI and ML headed to? - YourStory [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- In Coronavirus Response, AI is Becoming a Useful Tool in a Global Outbreak - Machine Learning Times - machine learning & data science news - The... [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- This tech firm used AI & machine learning to predict Coronavirus outbreak; warned people about danger zones - Economic Times [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- 3 books to get started on data science and machine learning - TechTalks [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- JP Morgan expands dive into machine learning with new London research centre - The TRADE News [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- Euro machine learning startup plans NYC rental platform, the punch list goes digital & other proptech news - The Real Deal [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- The ML Times Is Growing A Letter from the New Editor in Chief - Machine Learning Times - machine learning & data science news - The Predictive... [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- Top Machine Learning Services in the Cloud - Datamation [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- Combating the coronavirus with Twitter, data mining, and machine learning - TechRepublic [Last Updated On: August 18th, 2024] [Originally Added On: February 1st, 2020]
- Itiviti Partners With AI Innovator Imandra to Integrate Machine Learning Into Client Onboarding and Testing Tools - PRNewswire [Last Updated On: August 18th, 2024] [Originally Added On: February 2nd, 2020]
- Iguazio Deployed by Payoneer to Prevent Fraud with Real-time Machine Learning - Yahoo Finance [Last Updated On: August 18th, 2024] [Originally Added On: February 2nd, 2020]
- ScoreSense Leverages Machine Learning to Take Its Customer Experience to the Next Level - Yahoo Finance [Last Updated On: August 18th, 2024] [Originally Added On: February 2nd, 2020]
- How Machine Learning Is Changing The Future Of Fiber Optics - DesignNews [Last Updated On: August 18th, 2024] [Originally Added On: February 2nd, 2020]
- How to handle the unexpected in conversational AI - ITProPortal [Last Updated On: August 18th, 2024] [Originally Added On: February 5th, 2020]
- SwRI, SMU fund SPARKS program to explore collaborative research and apply machine learning to industry problems - TechStartups.com [Last Updated On: August 18th, 2024] [Originally Added On: February 5th, 2020]
- Reinforcement Learning (RL) Market Report & Framework, 2020: An Introduction to the Technology - Yahoo Finance [Last Updated On: August 18th, 2024] [Originally Added On: February 5th, 2020]
- ValleyML Is Launching a Series of 3 Unique AI Expo Events Focused on Hardware, Enterprise and Robotics in Silicon Valley - AiThority [Last Updated On: August 18th, 2024] [Originally Added On: February 5th, 2020]
- REPLY: European Central Bank Explores the Possibilities of Machine Learning With a Coding Marathon Organised by Reply - Business Wire [Last Updated On: August 18th, 2024] [Originally Added On: February 5th, 2020]
- VUniverse Named One of Five Finalists for SXSW Innovation Awards: AI & Machine Learning Category - PRNewswire [Last Updated On: August 18th, 2024] [Originally Added On: February 5th, 2020]
- AI, machine learning, robots, and marketing tech coming to a store near you - TechRepublic [Last Updated On: August 18th, 2024] [Originally Added On: February 5th, 2020]
- Putting the Humanity Back Into Technology: 10 Skills to Future Proof Your Career - HR Technologist [Last Updated On: August 18th, 2024] [Originally Added On: February 6th, 2020]
- Twitter says AI tweet recommendations helped it add millions of users - The Verge [Last Updated On: August 18th, 2024] [Originally Added On: February 6th, 2020]
- Artnome Wants to Predict the Price of a Masterpiece. The Problem? There's Only One. - Built In [Last Updated On: August 18th, 2024] [Originally Added On: February 6th, 2020]
- Machine Learning Patentability in 2019: 5 Cases Analyzed and Lessons Learned Part 1 - Lexology [Last Updated On: August 18th, 2024] [Originally Added On: February 6th, 2020]
- The 17 Best AI and Machine Learning TED Talks for Practitioners - Solutions Review [Last Updated On: August 18th, 2024] [Originally Added On: February 6th, 2020]
- Overview of causal inference in machine learning - Ericsson [Last Updated On: August 18th, 2024] [Originally Added On: February 6th, 2020]