The search engine Google, which gave its name to the
company Google is the search engine on the Internet's most
widely used worldwide. In 2009, 67% of Internet use [1].
Summary
* 1 Principles and characteristics
o 1.1 The PageRank system
o 1.2 Sobriety and recovery of words
o 1.3 Infrastructure
Ø 1.4 Logos
o 1.5 Beta
* 2 Services
* 3 Tips for Using the Google search engine
O 3.1 Terms to search
Ø 3.2 Logical (Boolean)
o 3.3 Limitations
o 3.4 Date
Ø 3.5 Sorting results
Ø 3.6 Additional functions
Ø 3.7 SearchWiki
Ø 3.8 Special Characters
* 4 Misuse of Google
O 4.1 Competition positioning
O 4.2 Google bomb
O 4.3 Google fight
Ø 4.4 Google Whacks
o 4.5 "Fake Google"
* 5 Limitations and errors from Google
o 5.1 The size of the database
o 5.2 The effectiveness of research
* 6 Rating Engine
* 7 Controversies
o 7.1 Controversy about the influence of the
content of the results displayed
o 7.2 Case Tiananmen
Ø 7.3 Case BMW Germany
Ø 7.4 Case keywords in France
Ø 7.5 Controversy on the number of results
displayed
* 8 Notes and references
* 9 See also
o 9.1 Article Related
o 9.2 External links
Principles and Characteristics [edit]
The PageRank system [edit]
The operating principle of Google, which made its success is
based on an invention to its creators, PageRank: when a
document is pointed to by many links (link popularity),
increases its PageRank. Plus its PageRank, the higher it will
likely be displayed in the first search results. This system
gives an indication of the "popularity" of material from other
web documents.
This principle was immediately a success because it allowed
more relevant results than other search engines that simply
to recognize the keywords inserted in the pages of the sites.
It also permits what is called Google bombing.
Sobriety and recovery of words [edit]
In addition, this search engine is popular for its fast search
and sobriety: no Flash, no flashing banner, etc.. Its
interface has inspired the other engines like Yahoo.
This sober, far from being anecdotal, is at least partly
behind the success of the site. At the time of its launch, in
fact, the fashion was to search engines inserted on pages
loaded very content and advertising. These pages are often
slow to appear and difficult to read.
He still uses a system of AdWords (ad words ") to pay. This
system is based on a value per word depending on its
application. The more the word will be asked more expensive
it will be paid per click. But it is always possible for the user
to block the display of such advertisements through plugins
like CustomizeGoogle for Firefox.
Amenities [edit]
Around 2002, Google claimed to distribute the load over 10
000 PCs running a modified Linux kernel. The figure of 1 000
simultaneous requests peak was also frequently mentioned.
Actual figures appear 10 times higher. However, they are
secrets, including not allowing to easily calculate the
investment required to compete with Google.
* Google and Akamai: Cult of Secrecy vs.. Kingdom of Openness
Google uses robots named Googlebot visiting at regular
intervals all websites have requested to be referenced to
maintain updated database that provides answers to queries.
Article: Google Platform.
Logos [edit]
Apart from the official logo [1], the site adopts special logos
for certain festivals and events: the Google Doodles. Made by
Dennis Hwang, an American designer of Korean origin aged
23, they appear regularly as a local or international festival
(new year, national holidays, etc..) Or events (Olympics, to
commemorate a person shows, etc..) permits.
All logos festivals and events to www.google.com put online
since 1999 are available here and, more specifically, those
that emerged in France are available here.
Beta [edit]
A beta version is usually a note indicating that a program is
nearing completion. At Google it's become a trademark
affixed to most services and software except the search
engine and advertising services.
The interest lies in the term "beta" is that, in terms of
quality of service, it binds to any obligation of result, since it
is a development phase. This may also mean that Google is
constantly improving phase.
This peculiarity Google becomes a fashion that results from
its competitors by a more overt use of this reference.
Services [edit]
This search engine is available in 35 languages and offers its
interface into over 100 languages.
Google is basically a search engine for web pages, it has
gradually extended to different types of documents (PDF,
Microsoft Word, Flash, ...), images. As well as Usenet
newsgroups, Google Groups since the purchase of Deja News.
The web2news gives access to forums on Google.
He now has a section directory for finding sites by category
(directory dmoz ranked by PageRank), and a portal for news
gatherin sites of major newspapers and major news
agencies.
The vast popularity of Google and its development policy very
diverse (links advertising, purchasing databases and archives
forums) eventually lead to a number of concerns about the
potential drift of that power: in effect, it sometimes just
"googling" the name of a person to obtain information about
her personal and thorough.
Google offers as an increasing number of ancillary functions,
available either through the normal Google field or in the
form of web application.
Tips for Using the Google search engine [edit]
Google offers a simple form and an advanced search form to
exclude words or search complete expressions (see here other
advanced features).
Terms to search [edit]
Documentation for Google on its interpretation of queries is
fairly spartan. The observed changes in the operation shows
that this is probably by design to keep freedom of maximum
change. The following must be continuously validated and
modified to track changes.
* H2O is searched as one word and then Google does not
find documents with H 2 O or H2O in their text. These are
found by asking "H 2 O". H-2-O (see the role of the hyphen)
is both H2O as H 2 O and H2O. Unfortunately, the operator
"dash" seeks only the two extreme combinations (all words
glued or all separate words: it is not H2 O).
* Word: A word and its variants, singular / plural - male
/ female - with or without accents. For example, pommel
horse pommel horse found: this algorithm works in French
and English but not in Dutch (he does not know the plural in
"en"). Note: the variant you specify is favored in the sorting
of documents submitted.
* ~ Word: A word and its synonyms. Works with an
English dictionary on the same research in French and Dutch!
Try the request ~ car-car to see the words found outside the
strict automotive term. ~ returns arabic Egypt, Lebanon,
Arab and Hindu ...! We do not know the source of synonyms.
* "Word": An exact word. Google does not take into
account the emphasis for research but favors the form
specified in the sorting of documents submitted.
* "Word ... word": a series of specific words,
expressions
* "* Word word" in a series of words in quotes (and only
there), a star can be put in place one or more complete words
you do not wish to specify. For example: "* Ministry of
Trade and Commerce"
* Site: www ...: an area of origin. It may be more or
less general and even indicate top level domains. For example:
site: org OR site: com
* Intitle: "... word word": a series of specific words in
the document title (tag and / or first tag .. )
* + Word search word even if it is an empty word in the
Language of the user (more than in French for example) and
look in the light accents (eg + dice). A "+" is assumed if a
word is searched: tea alone is searched as if they had typed
+ tea (This form has a meaning very different from Altavista
where the "+" indicates required words). When sorting
documents, Google gives preference to the typed form: the
operator "+" no longer has much interest.
* Word-word search term consisting of multiple words,
whether written with dashes, spaces or even no space at all:
skyscraper located skyscraper, skyscrapers and gratteciel.
skyscrapers does not mean all the same thing as skyscrapers
(see operator "-"). Warning: bare-foot is going barefoot and
vanupied but not going nupied.
Logical operators (Boolean) [edit]
* Space: The documents must contain what is right and
what is left. Sorting Google promotes various documents
which specified words are close to each other (see below).
* OR or |: Documents may contain what is right or what
is left. Note: OR must be written in capital letters!
* Space-(minus sign) to exclude documents containing the
following word (NOT)
* (...): Sub-expression to evaluate before making operations surrounding
The GoogleGuide you give other examples. The site HotBot
United States provides a form of Google search sometimes
more convenient than that of Google itself.
Restrictions [edit]
* Queries are limited to 32 words.
* Only the first 1000 results relevant to a query are
available, even if the connections are more numerous. The
results can sometimes be less than 1000 due to the removal
of pages from one site. According to Google, more than 1000
results would lead to a heavy burden on an application actually
rather rare.
In theory, sorting ensures that references are most useful
first (difficult to validate).
Dates [edit]
* When searching by date, the date is that of indexing in
the database (ie the visit of the "spider" Google) and not the
actual publication of the page (as provided by the http server
: / /)
* In the advanced search form, you can search on the
last 3, 6 and 12 months.
* The operator daterange: Julian date, Julian date (or
the form of site HotBot) to specify another date range. A
Julian date is the number of days elapsed since the beginning
of our era: the http://www.numerical-recipes.com/julian.html
site can help you calculate.
Sort the results [edit]
The quality of Google comes from its ability to show first the
pages deemed most relevant in general and relevant to a
particular search. Google sorts the documents found in the
function:
* Measures of site quality in general and also of each
page (consistency of meta-information with the visible text
of the page for example). These measures are not or poorly documented.
* A measure of the weight of each page indexed: This is
the PageRank algorithm which reads a passage quoted from Google:
We assume page A has pages T1 ... Tn Which point to it (ie,
are citations). The parameter d is a damping factor Which
can be set between 0 and 1. We usually set d to 0.85. There
are more details about d in the next section. Also C (A) is
defined as the number of links going out of page A. The
PageRank of a page A is given as follows: PR (A) = (1-d) + d
(PR (T1) / C (T1) + ... + PR (Tn) / C (Tn)) Note that the
PageRanks form a probability distribution over web pages, so
the sum of all web pages' PageRanks will be one. PageRank or
PR (A) can be calculated using a simple iterative algorithm,
and corresponds to the principal eigenvector of the
normalized link matrix of the web. See also: [2]
* An assessment of the relevance of page vis-à-vis the
research conducted. This is done taking into account:
o presence in the top of the search words (possibly
expanded their synonyms or their variants, singular / plural)
o the location of these words on the page (title,
metadata, text) or links to this page: the latter may cause
ethical problems because a page can be found indexed by the
words of others that the authors use to describe it. (Try:
"miserable failure", the author of the target page does not
consciously trying this description!)
o From tf-idf for each word formula that takes into
account the number of occurrences of the word in the top-
weighted by the inverse of the relative frequency of this
word in that part of the web indexed by Google:
+ Tfi = frequency of term i in page
+ Dfi = number of web pages containing the term i
+ D = number of documents on the Web
+ This formula was developed by Gerard Salton
(1927-1995), Cornell University, based on the Information
Theory of Claude Shannon.
o the distance between the top searched words: the
more they are close to each other, the more the page is
considered relevant vis-à-vis the research conducted. See: [3]
* The country indicated by the URL to Google: google.be
gives strong preference to sites in Belgium, French google.fr
sites, U.S. sites to google.com and google.co.uk to English
sites, etc.. It is really important to choose the "localization"
of his research. The next page will more often serve as start
page of a search: [4]
* The language of the user who is also one of the
searched words: the only form to specify it is [5]. The only
other way to change the language of the user is to edit "by
hand" the Google URL (http://www.google.be/search?hl=fr&q
=...) by changing the parameter & hl = xx (xx is the two-
letter code of the desired language).
It is essential to research into changing its language user
based on the language of your search words. Google then
sorts the documents supporting this language (and perhaps
use one day good dictionary of synonyms). It then uses the
appropriate algorithm to make the equivalent singular and
plural, feminine and masculine (reminder: the Netherlands
seems poorly supported at the moment).
Additional functions [edit]
Google also offers additional functions:
* In the headlines: some keywords related to the current
refer top results 3 titles of articles in Google News. A
button to search the headlines.
* Currency Conversion: ex. : In the search field, type: 3
euros in dollars, Google will display: € 3 = x, xxxxx U.S.
dollars (rates provided by Citibank unsecured).
* Google Calculator: in the search field, type a
mathematical formula
* Machine Translation
* PDF Files
* Page caching: allows you to display the page stored in
Google Base, useful if the page no longer exists
* Similar pages
* Links: in the search field type in link: site.com to view
pages that poinent external to the specified URL
* Operators Targeting can do research exclusively on a
single web address. Syntax: "site: your query.
* I'm Feeling Lucky
* Definitions: provides one or more (or any) definition (s)
of words, taken from various websites (Wikipedia and
Wiktionaire mainly, and other sites). This function is now
available in English, French, Spanish, German, Chinese,
Italian and Russian. Syntax: "define: word to define"
* Google Movies: Enter film title to view criticism of the
film which was typed the title (movies: title for reviews in
English) on Google Movies, you can choose between web
search and retrieval of films showing movie showtimes
cinemas in some cities. Movies on Google, you can choose
between web search and retrieval of films showing the
desired movie reviews.
SearchWiki [edit]
This section is empty, not detailed enough or incomplete.
Your help is welcome!
Since November 21, 2008, functionality SearchWiki can
personalize the Google results page on the English version.
The novelty has appeared on the French version of Google
April 28, 2009 [2].
Special characters [edit]
Google handles accents written as entities, but not Unicode.
Therefore, search for "ALKENE" and "alkene" does not give
the same result (for a single word is searched by giving
preference to the form in which it was written) while seeking
"encyclopedia" or "ENCYCLOPEDIA" does not change nothing.
If you type "recipe for the soup * and tomato", Google will
offer basil or pumpkin in place of the star. We can expand
his research to the synonyms of a word, by preceding the
symbol "~". The "+" used to force the word to be interpreted
as such by Google (this is particularly useful for the accents
in French).
Misuse of Google [edit]
The many features of Google gave birth to various
recreational uses by the Internet.
Contest positioning [edit]
Many competitions positioning emerged on Google and other
engines. The goal is to place a page on a keyword more or
less fictional first positions of search results on it. The first
important contest on the request SERPS. In 2004, a
competition on the French expression stork eater was
attended by 170 candidates and reached 420 000 queries on
Google for that phrase. The controversies have taken place
on the motivations of these competitions, which are some
tools useful experience in SEO, but as others have that
motivation only fun, making Google a simple playground
Google bomb [edit]
Google Bombing (Google bombing) is to combine the web pages
over a possible expression to a particular website, so a
Google search on that phrase back the site in question in the
first results. The bombing campaign are Google through
forums or blogs, encouraging users to participate. Simply add
the participant to a website or blog a link to the target site
by associating the expression.
One of the first sites to have been targeted by a bombing is
the biography of the President of the United States George
Walker Bush [6] on the website of the White House. A google
search on the term "failure" or "miserable failure" still gave
this site as first result, until the Mountain View company has
made some adjustments on their system, which would
significantly reduce the number of Google bombing ( see
below).
During autumn 2005, following a massive email campaign
launched by the political party of Nicolas Sarkozy, and in
retaliation, the Webmasters have called to make Google
Bombing on behalf of the Minister of Interior. So when you
type Nicolas Sarkozy [7] in Google, you get second place a
link to Iznogoud, the cartoon character who wants to be
caliph instead of the caliph. The Google Bombing is to put on
the page of a website link (Iznogoud or George Bush) and
associate it with a text (or Nicolas Sarkozy miserable
failure). If the operation is performed by a certain number
of webmasters, the result is rapid misleading links back in
the first results of Google.
End January 2007, Google announced that it has developed an
algorithm to solve the problem of "google bombing" and that
in any language. Now "miserable failure" referring to a page
explaining the "google bombing".
Google Fight [edit]
The Google Fight is to compare the number of results
returned by Google on several expressions is declared the
winner word having obtained the most results. Customers
have fun and to compare names, ideas, policies, etc.. A
website has even been created to provide an interface to this
type of "fight" [8].
Since January 2006, the team intercepts Google Google Fight
queries and returns results fantastic. You can check this by
querying the site several times on the same couple of names.
Google Whacks [edit]
The Google Whacks is a game of finding two words that
associates in a Google search gives a unique result. The terms
must exist in the dictionary, and found the site should not be
a simple list of words. Quotes and all punctuation should not
be used. The score is often calculated by multiplying the
number of results of the first term by the number of results
for the second word. [9]
False Google "[edit]
There are search engines that are copies of Google in a
language minority and non-official. Most of the time they are
created in a humorous purpose.
* Google ch'ti: Gogol
* Google Walloon: Gôgueule
* Google in West Flemish: Hoegel
* In présipauté of Groland: Grögler
Limitations and errors from Google [edit]
The main limitation of Google is that the engine that runs the
web visible, leaving aside all professional databases,
sometimes enormous, and often appropriate, but access is
limited (but sometimes free). Example: Dialog (15 000 GB).
Studies show internal limits of Google, such as large
variations in the number of results announced in identical
searches at certain periods [3], or inconsistent results when
comparing the results of some research, due to limitations
techniques [4] [5] [6].
The size of the database [edit]
Several studies have shown that the number of pages actually
indexed is only half the number reported the other half would
be the pages visited by the robot of Google, but only a part
(the header without the body page) would be indexed. These
pages are mostly non-English pages, because of AdWords
technology, which is used only for English, which is the main
source of funding from Google.
This concept of index size has been and remains a marketing
major search engines. In late 2005, following a critical
analysis [5], started in January 2005, the size of its index,
initiated by Jean Veronis, the company Google has decided
not to put that argument forward. [Ref. necessary]
For example, this marketing approach, Google announced a
doubling of the size of its index announced the day after the
launch of MSN Search [ref. necessary].
The effectiveness research [edit]
When searching for a medium complexity (using a Boolean
operator, that is to say an area [AND operator]), results
vary up to threefold in the same day in some cases ,
according to an order of magnitude ranging from one to ten.
Sometimes the search does not include operators requested.
This variability in the number of responses reflects the
architecture of Google. There are indeed several servers
scattered around the world, hosting the index of the pages
visited by Google. According to the location of a user (or as
the local site of Google interviewed), its application is
directed towards one or other of these servers. Normally,
each index is identical to others but as they are not
synchronized in real time (but at intervals exceeding one
month), only the main index located in California, is
constantly updated and gives a maximum correct answers. The
server can thus give ten times more responses than a
secondary server.
Rating engine [edit]
According to Jean Verona [7] Yahoo! and Google are the two
best engines (among six major engines Francophone). For the
author, these two engines with equivalent performance, the
reason for the massive preference for Google users is the
relevance of results.
But according to Trent, it could be less than Windows Live
Search [ref. necessary].
Controversies [edit]
Controversy about the influence of the content of the results
displayed [edit]
By becoming the first search engine in terms of use, Google
has become the first vehicle information on the Internet.
This role - convey information - is inherent in the business
of search engines and the resulting problems are not all due
to Google, which is not the author of the content pages.
Beyond the difficulties posed by the strategic importance of
Google ranking in the economic field, the real problem lies in
the strong ideological influence that have pages that appear
in the first results which have emerged as gospel. The
popularity of a search engine such as Google can be used as a
vehicle for misinformation, where the influence of a site is
especially significant that the keyword is popular and he tops
the list. The Google executives admit [8] to be powerless
against the phenomena of intoxication and defamation that
currently appear in Google results first, the technique can not
judge the sincerity of the information.
Case Tiananmen [edit]
The leaders of China People's embarrassed that a search on
Google Images in Tiananmen returns photos of tanks
suppressing the student revolt, obtained in 2006 from the
Google query "Tiananmen" on Google's Chinese portal will no
longer return the images [ref . necessary].
Case BMW Germany [edit]
Following attempts to BMW Germany and the referrer, to
increase its PageRank (and therefore the positioning of links
on BMW car as queries in Google), the car company has been
blacklisted by Google that was eliminated from its index in
January 2006. Research on "BMW" returns only references
on his website World [10].
Case keywords in France [edit]
In 2005, the UMP and especially Nicolas Sarkozy has been
criticized for having bought dozens of keywords such as
"riot", "CPE", "Jack Lang" ... referring to the site of the
UMP.
Controversy over the number of results displayed [edit]
When the number of pages is too large, only 1 000 first
pages are viewable, which is a reasonable limit and adopted
by most search engines. However, some users suspect that
the number of pages found to be artificially "inflated" when
it exceeds this limit. This hypothesis is based on two facts:
* He sometimes displays a number of pages larger than
the number of pages of the web indexed by Google (for
example with a query on a word used in all pages of English
as "the" definite article);
* When we made several successive searches on the same
keyword, the result varies. This reflects the number of
servers used by Google, each server with different number of
pages recorded for the same result.
See for example the message <449d92eb $ 0 $ 1002 $
ba4acef3@news.orange.fr> on the usenet group
fr.sci.physique and [11]
Notes and references [edit]
1. ? "Who's afraid of Google, G.F., Challenges, n? 180,
September 17, 2009, p. 49
2. ? New Google: customizing results [archive]
3. ? http://www.bases-
publications.com/revues/netsources/e-
docs/00/00/02/C9/document_article.phtml [archive] item-
base publications
4. ? (en) Mark Liberman, "Google recall (They stole his
mind, he now wants it back.) [archive], January 24, 2005
5. ? a and b Jean Veronis, "Accounts cans at google
[archive], January 26, 2005
6. ? Jean Veronis, "Web: The mystery of the missing
pages of Google solved [archive], February 8, 2005
7. ? "[Pdf] comparative study of six search engines
[Archive]", February 2006
8. ? Interview with Eric Schmidt, Google Defining
television documentary broadcast by CBSNews (January
2005)
See also [edit]
Related Articles [edit]
* ElgooG a humorous mirror site of Google.
* Link farm, a method of diversion of search engine
Google
* SearchMash engine "experimental" Google (stopped
since November 2008)
[Edit]
* Google.com
* Google.fr
* Special Features of Google
* (En) experimental Features