I dusted off some old code today to look at common word pairs in some customer data. NGRAM analysis finds the most common bi-grams (2 word combinations), tri-grams (3 word combinations), etc. The code is simple, and I share it here in case you ever need to do the same thing:
require 'zip/zipfilesystem'
def words text
text.downcase.scan(/[a-z]+/)
end
Zip::ZipFile.open('../text.txt.zip') { |zipFile| # training data
$words = words(zipFile.read('text.txt')) # is in a ZIP file
}
bi_grams = Hash.new(0)
tri_grams = Hash.new(0)
num = $words.length - 2
num.times {|i|
bi = $words[i] + ' ' + $words[i+1]
tri = bi + ' ' + $words[i+2]
bi_grams[bi] += 1
tri_grams[tri] += 1
}
puts "bi-grams:"
bb = bi_grams.sort{|a,b| b[1] a[1]}
(num / 10).times {|i| puts "#{bb[i][0]} : #{bb[i][1]}"}
puts "tri-grams:"
tt = tri_grams.sort{|a,b| b[1] a[1]}
(num / 10).times {|i| puts "#{tt[i][0]} : #{tt[i][1]}"}
Output might look like this:
bi-grams:
in the : 561
in Java : 213
...
tri-grams:
in the code : 119
Java source code : 78
...
Cool stuff. Ruby is my favorite language for tool building.
- Classic reasoning systems like Loom and PowerLoom vs. more modern systems based on probalistic networks [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Using Amazon's cloud service for computationally expensive calculations [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Software environments for working on AI projects [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- New version of my NLP toolkit [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Semantic Web: through the back door with HTML and CSS [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Java FastTag part of speech tagger is now released under the LGPL [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Defining AI and Knowledge Engineering [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Great Overview of Knowledge Representation [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Something like Google page rank for semantic web URIs [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- My experiences writing AI software for vehicle control in games and virtual reality systems [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- The URL for this blog has changed [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- I have a new page on Knowledge Management [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Good video: Knowledge Representation and the Semantic Web [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Using the PowerLoom reasoning system with JRuby [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Machines Like Us [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- RapidMiner machine learning, data mining, and visualization tool [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- texai.org [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- NLTK: The Natural Language Toolkit [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- My OpenCalais Ruby client library [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Ruby API for accessing Freebase/Metaweb structured data [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Protégé OWL Ontology Editor [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- New version of Numenta software is available [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Very nice: Elsevier IJCAI AI Journal articles now available for free as PDFs [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Verison 2.0 of OpenCyc is available [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- What’s Your Biggest Question about Artificial Intelligence? [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Minimax Search [Knowledge] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Decision Tree [Knowledge] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- More AI Content & Format Preference Poll [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- New Planners Solve Rescue Missions [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Neural Network Learns to Bluff at Poker [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Pushing the Limits of Game AI Technology [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Mining Data for the Netflix Prize [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Interview with Peter Denning on the Principles of Computing [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Decision Making for Medical Support [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Neural Network Creates Music CD [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- jKilavuz - a guide in the polygon soup [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Artificial General Intelligence: Now Is the Time [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Apply AI 2007 Roundtable Report [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- What Would You do With 80 Cores? [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Software Finds Learning Language Child's Play [News] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Artificial Intelligence in Games [Article] [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Artificial Intelligence Resources [Last Updated On: November 8th, 2009] [Originally Added On: November 8th, 2009]
- Alan Turing: Mathematical Biologist? [Last Updated On: April 25th, 2012] [Originally Added On: April 25th, 2012]
- BBC Horizon: The Hunt for AI ( Artificial Intelligence ) - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Can computers have true artificial intelligence" Masonic handshake" 3rd-April-2012 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Kevin B. Korb - Interview - Artificial Intelligence and the Singularity p3 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Artificial Intelligence - 6 Month Anniversary - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Science Breakthroughs [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Hitman: Blood Money - Part 49 - Stupid Artificial Intelligence! - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Research Members Turned Off By HAARP Artificial Intelligence - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Artificial Intelligence Lecture No. 5 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- The Artificial Intelligence Laboratory, 2012 - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Charlie Rose - Artificial Intelligence - Video [Last Updated On: April 30th, 2012] [Originally Added On: April 30th, 2012]
- Expert on artificial intelligence to speak at EPIIC Nights dinner [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- Filipino software engineers complete and best thousands on Stanford’s Artificial Intelligence Course [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- Vodafone xone™ Hackathon Challenges Developers and Entrepreneurs to Build a New Generation of Artificial Intelligence ... [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- Rocket Fuel Packages Up CPG Booster [Last Updated On: May 4th, 2012] [Originally Added On: May 4th, 2012]
- 2 Filipinos finishes among top in Stanford’s Artificial Intelligence course [Last Updated On: May 5th, 2012] [Originally Added On: May 5th, 2012]
- Why Your Brain Isn't A Computer [Last Updated On: May 5th, 2012] [Originally Added On: May 5th, 2012]
- 2 Pinoy software engineers complete Stanford's AI course [Last Updated On: May 7th, 2012] [Originally Added On: May 7th, 2012]
- Percipio Media, LLC Proudly Accepts Partnership With MIT's Prestigious Computer Science And Artificial Intelligence ... [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
- Google Driverless Car Ok'd by Nevada [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
- Moving Beyond the Marketing Funnel: Rocket Fuel and Forrester Research Announce Free Webinar [Last Updated On: May 10th, 2012] [Originally Added On: May 10th, 2012]
- Rocket Fuel Wins 2012 San Francisco Business Times Tech & Innovation Award [Last Updated On: May 13th, 2012] [Originally Added On: May 13th, 2012]
- Internet Week 2012: Rocket Fuel to Speak at OMMA RTB [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
- How to Get the Most Out of Your Facebook Ads -- Rocket Fuel's VP of Products, Eshwar Belani, to Lead MarketingProfs ... [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
- The Digital Disruptor To Banking Has Just Gone International [Last Updated On: May 16th, 2012] [Originally Added On: May 16th, 2012]
- Moving Beyond the Marketing Funnel: Rocket Fuel Announce Free Webinar Featuring an Independent Research Firm [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
- MASA Showcases Latest Version of MASA SWORD for Homeland Security Markets [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
- Bluesky Launches Drones for Aerial Surveying [Last Updated On: May 23rd, 2012] [Originally Added On: May 23rd, 2012]
- Artificial Intelligence: What happened to the hunt for thinking machines? [Last Updated On: May 25th, 2012] [Originally Added On: May 25th, 2012]
- Bubble Robots Move Using Lasers [VIDEO] [Last Updated On: May 25th, 2012] [Originally Added On: May 25th, 2012]
- UHV assistant professors receive $10,000 summer research grants [Last Updated On: May 27th, 2012] [Originally Added On: May 27th, 2012]
- Artificial intelligence: science fiction or simply science? [Last Updated On: May 28th, 2012] [Originally Added On: May 28th, 2012]
- Exetel taps artificial intelligence [Last Updated On: May 29th, 2012] [Originally Added On: May 29th, 2012]
- Software offers brain on the rain [Last Updated On: May 29th, 2012] [Originally Added On: May 29th, 2012]
- New Dean of Science has high hopes for his faculty [Last Updated On: May 30th, 2012] [Originally Added On: May 30th, 2012]
- Cognitive Code Announces "Silvia For Android" App [Last Updated On: May 31st, 2012] [Originally Added On: May 31st, 2012]
- A Rat is Smarter Than Google [Last Updated On: June 5th, 2012] [Originally Added On: June 5th, 2012]
- Why a rat is smarter than Google [Last Updated On: June 5th, 2012] [Originally Added On: June 5th, 2012]