Bengals_12th_Man Posted June 9, 2006 Report Share Posted June 9, 2006 Data Mining on the Internet with Google Google has quickly become one of the most well known words in the world and is used by millions daily, including myself. In an advanced database class back in university, we spent a couple of weeks studying the inner workings of search engines, and one topic which happened to come up was data mining using Google. Much to my surprise, out of a class of 80 fourth year computer engineers maybe four or five knew how to use Google to perform any sort of advanced queries. Google (and many other search engines) has the ability not only to search on keywords, but also using a more “database-ish” query language to really narrow down your search results. Below is a summary of a few of the most useful lesser known features. Note: in the examples, replace cwire.org with your own domain. Basic Usage: Use quotation marks ” “ to locate an entire string. eg. “bill gates conference” will only return results with that exact string. Mark essential words with a + If a search term must contain certain words or phrases, mark it with a + symbol. eg: +”bill gates” conference will return all results containing “bill gates” but not necessarily those pertaining to a conference Negate unwanted words with a - You may wish to search for the term bass, pertaining to the fish and be returned a list of music links as well. To narrow down your search a bit more, try: bass -music. This will return all results with “bass” and NOT “music”. General Tips: (I use many of these almost on a daily basis) site:www.cwire.org This will search only pages which reside on this domain. related:www.cwire.org This will display all pages which Google finds to be related to your URL link:www.cwire.org This will display a list of all pages which Google has found to be linking to your site. Useful to see how popular your site is spell:word Runs a spell check on your word define:word Returns the definition of the word stocks: [symbol, symbol, etc] Returns stock information. eg. stock: msft maps: A shortcut to Google Maps phone: name_here Attempts to lookup the phone number for a given name cache: If you include other words in the query, Google will highlight those words within the cached document. For instance, cache:www.cwire.org web will show the cached content with the word “web” highlighted. info: The query [info:] will present some information that Google has about that web page. For instance, info:www.cwire.org will show information about the CyberWyre homepage. Note there can be no space between the “info:” and the web page url. weather: Used to find the weather in a particular city. eg. weather: new york Advanced Tips: filetype: Does a search for a specific file type, or, if you put a minus sign (-) in front of it, it won’t list any results with that filetype. Try it with .mp3, .mpg or .avi if you like. daterange: Is supported in Julian date format only. 2452384 is an example of a Julian date. allinurl: If you start a query with [allinurl:], Google will restrict the results to those with all of the query words in the url. For instance, [allinurl: google search] will return only documents that have both “google” and “search” in the url. inurl: If you include [inurl:] in your query, Google will restrict the results to documents containing that word in the url. For instance, [inurl:google search] will return documents that mention the word “google” in their url, and mention the word “search” anywhere in the document (url or no). Note there can be no space between the “inurl:” and the following word. allintitle: If you start a query with [allintitle:], Google will restrict the results to those with all of the query words in the title. For instance, [allintitle: google search] will return only documents that have both “google” and “search” in the title. intitle: If you include [intitle:] in your query, Google will restrict the results to documents containing that word in the title. For instance, [intitle:google search] will return documents that mention the word “google” in their title, and mention the word “search” anywhere in the document (title or no). Note there can be no space between the “intitle:” and the following word. allinlinks: Searches only within links, not text or title. allintext: Searches only within text of pages, but not in the links or page title. bphonebook: If you start your query with bphonebook:, Google shows U.S. business white page listings for the query terms you specify. For example, [ bphonebook: google mountain view ] will show the phonebook listing for Google in Mountain View. phonebook: If you start your query with phonebook:, Google shows all U.S. white page listings for the query terms you specify. For example, [ phonebook: Krispy Kreme Mountain View ] will show the phonebook listing of Krispy Kreme donut shops in Mountain View. rphonebook: If you start your query with rphonebook:, Google shows U.S. residential white page listings for the query terms you specify. For example, [ rphonebook: John Doe New York ] will show the phonebook listings for John Doe in New York (city or state). Abbreviations like [ rphonebook: John Doe NY ] generally also work. Putting it all Together: Now it’s time to start to get creative with our search terms and really narrow down our results. Now that we have the basics, let’s start to combine them all into one search term. Example #1: Search for some MP3s Let’s say you’re a Beatles fan and want to see if you can find some of their songs on the Internet without using Kazaa, etc. Try this query: “index of” + “mp3″ + “beatles” -html -htm -php or you could try this query: * “index of/mp3″ -playlist -html -lyrics beatles Right away on the first few results returned by Google you can download MP3s. Example #2: Mixing some techniques together Here’s a simple exercise. We’ll mix around a few terms to get more accurate results. Let’s say we want to research sleep recommendations. One assumption could be that research papers on this topic would most likely be on an educational website — perhaps with a .edu domain. We could try this query: sleep recommendations site:edu Maybe we’re in my situation, and am thinking of applying to grad school. Let’s see if we can find the Graduate Studies Admissions Requirements at the University of Toronto. We could try this query: grad school admission requirements site:utoronto.ca Summary: After reading this article, you might be thinking “well, I could probably find those results without remembering these advanced search terms”. Well, the truth is that you probably could. The reason you want to start to use these advanced search tips is because they will help you find what you’re looking for faster. They greatly help narrow down the results, and more often than not, the information you were looking for will be in the first two or three results. [url="http://www.cwire.org/data-mining-using-google/"]http://www.cwire.org/data-mining-using-google/[/url] Quote Link to comment Share on other sites More sharing options...
Guest ThurmanMunster Posted June 11, 2006 Report Share Posted June 11, 2006 i think this should be pinned. Quote Link to comment Share on other sites More sharing options...
Guest BengalBacker Posted June 11, 2006 Report Share Posted June 11, 2006 I've used a couple of the features described. Very helpful, great post. Quote Link to comment Share on other sites More sharing options...
Guest schotzee Posted June 11, 2006 Report Share Posted June 11, 2006 [quote name='ThurmanMunster' post='280790' date='Jun 11 2006, 05:11 AM']i think this should be pinned.[/quote] Ditto Excellent info 12th Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.