Specifying Metadata

Metadata helps search engines track and index web pages. There are number of meta tags that needs to be included in any web page for it to be indexed by search engines. Search engines are influenced by the metadata you provide in your web pages using meta tags.

Why should we specify any data in our web pages for search engines? You may be thinking we write the pages for people but not for search engines. Actually, a successful website cannot afford to exclude information that helps search engines to index pages. Search engines bring traffic to websites. (Think of the last time you used a major search engine such as www.google.com, www.yahoo.com, or www.msn.com. Would you have found the site you were looking for without searching first in a search engine? Probably not given that there are millions of websites, it is very difficult to find a website without searching first.) One of the ways to help search engines do their job is by using meta tags.

Search engines use meta tags and the content of the web page to determine where your page is displayed when someone makes a search in a particular search engine. Obviously, you want your pages to come up first or near the top when a search engine returns results. To give you an example, if you go to Google.com and type script buzz in the search box, chances are that www.scriptbuzz.com will come up first or on the first page even though Google displays thousands of other related pages to “script buzz.” So the idea is that a web searcher is more likely to click on a link that comes up first or near first than a page that is more clicks away.

Learn next about:

Meta tag attributes
HTML meta tags: keywords and description
Robot exclusion
- robots.txt file
- Robot control with <meta> tag
Search engine optimization

Meta tag attributes

Meta tags are used to provide information about a web page. The meta tags are optional and easy to use. There are three main attributes of meta tags: name, content, and http-equiv. The name attribute specifies the type of information. The content attribute includes the meta-information. (Access this page to learn how to use these two attributes to optimize your web page for search engines.) Lastly, the http-euiv attribute specifies a particular header type.

Name attributes

There are over a dozen of name attributes. The purpose of using name attributes is to provide information to the search engines. The following table summarizes the most common name attributes:

Attribute name	Explanation
description	Description of the web page.
keywords	Used to list keywords that describe the content of the web page.
creator or author	The organization or person who responsible for creating the webpage.
date	The date of publication in yyyy-mm-dd format.
identifier	A unique number identifying a web page.
language	Language of the page. Use a two-character language code.
rights	Used for adding copyright statement.

http-euiv attributes

The http-euiv attributes are equivalent to HTTP headers and control the action of a browser on a requested page. When displaying, the browser uses the instructions specified by the http-euiv attributes. The instructions may contain information about when the content of the page will expire or when the page should be refreshed.

Specifying keywords and description

Meta tags are very important as they provide extra information to search engines about your web pages. The meta keywords tag and meta description tag both are very important and should not be left out of any web page that you want to be potentially found on top of other search engine results. As you work with meta tags, keep in mind that these two meta tags cannot alone place your pages on top of any search engine results. But meta tags do help your web pages in search engines.

The meta tags are placed in the “head” section of a web page. For example,

<head>
<meta name="keywords" content="scripting, scripting HTML, HTML help, Meta tags, meta tag, search engine, search engines">

<meta name="description" content="Provide a brief description of your page here. Remember to repeat important keywords here. Consider as an example: This page provides information about meta tags and how to create meta keyword and meta description tag. Meta tags are important to search engines as they provide extra information about a web page.">
</head>

Remember that the “head” tag is placed before the body tag in an HTML document. The “head” tag should contain your meta tags, as shown in the above HTML code. A meta tag starts with the word meta. The name attribute of a meta tag specifies the type of meta tag you want to create.

In our HTML code, we first created the keywords tag by setting the name attribute to keywords. Thus this starts our keyword tag. The content attribute of the meta tag contains the information that you want to add to a particular meta tag. Since we are creating the keyword tag, we list our important and relevant keywords to a web page inside the content attribute.

To create the description meta tag, simply start with the meta tag, as we did in our example above. In the example, note that we again are using the name and content attributes for the description tag. This may led you to wonder how can the search engines tell which tag is the description tag and which tag is the keywords tag when both tags are created with the same attributes? Although both tags use the same attributes (name and content), the values (inside the double quotation marks) of those attribute for each tag will differ.

For the description tag, we set the name attribute to description and the content attribute is set to a brief description of the page. In comparison, the name attribute for a keyword tag is set to “keywords” and the content attribute is set to a list of important keywords.

For your convenience, Script buzz has developed a meta tag generator tool that can help you create these meta tags.

Robot exclusion with robots.txt file

robots.txt is a text file that instructs search engine robots what pages within a website they do not have permission to index. If you have a web page (or a file or directory) that you do not want the robots to index (because it is a log file, a private directory, etc.), you may restrict the robots permission to that file by using a robots.txt file. When a robot attempts to index a site, it requests the robots.txt file first.

Suppose a search engine robot is about to index https://www.scriptbuzz.com. First, the robot will request http://www.scriptbuzz.com/robots.txt. (If the robots.txt file does not exist or is empty, the robot will index all files. Also, the file name is case-sensitive: it must be in lowercase. Note the robots.txt file must be in the root directory of a web site.) Then, it analyzes the robots.txt file for instructions on what documents from www.scriptbuzz.com it should exclude from indexing.

If you do not already have robots.txt file, it can be created with Notepad or some other text editor. Make sure you save your file in the root directory of the website and it must be saved as robots.txt.

The basic format of robots.txt file is listing of the particular spider whose access you want to limit and statements that specify which directory paths to disallow. You can also use the wildcard * to specifies rules for all spiders. For instance, the following:

User-agent: *
Disallow: /images/

denies access to the images folder for all spiders. If, however, you wanted to deny access to a specific spider such as Googlebot (Google’s spider), you would add to you robots.txt file:

User-agent: Googlebot
Disallow: /images/

This denies access to the images folder for the Googlebot spider.

To deny access to a specific file, specify the user agent and location of the file:

User-agent: *
Disallow: /temp.htm
Disallow: /contact-us/contact.htm

This denies access to all spiders to the temp.htm and contact-us/contact.htm files. Note the temp.htm is located in the root directory of the website and contact.htm is located in folder called contact.

When creating robots.txt file make sure not to reveal any files that contains sensitive or private information. By revealing the location of those files, you may aid malicious visitors or robots to misuse the files.

Robot control with <meta> tag

In addition to controlling robots access to your website with a robots.txt file, you can use the <meta> tag with the robots attribute. If you use the <meta> tag method to restrict robots’ access, it must be placed in the page that you do not want indexed. The second attribute that needs to be used with the <meta> tag is content. The content attribute determines what you want the robot to do for that particular webpage:

all — Index the page and follow the links. If the robot does not see any meta tag with robots attribute, it indexes the current page and follows all links.
none — Do not index the page and do not follow any links on the page.
index — Index the page
nonindex — Do not index the page
follow — Follow all the links on the page
nofollow — Do not follow all the links on the page.

The following, for example,

<meta name="robots" content="index">
<meta name="robots" content="nofollow">

says that index the page but do not follow any links on the page. You could combine the top two tags as one to achieve the same result:

<meta name="robots" content="index, nofollow">

When using the <meta> tag to exclude robots to a particular page, make sure not to specify contradictory instruction such as:

<meta name="robots" content="noindex, index">

<meta name="robots" content="follow, nofollow">

Any one of these instructions may be ignored completely by the spiders or may be processed partially. Also, the other disadvantage of using the <meta> tag approach is that it is not supported as widely as the robots.txt file.

Optimizing pages for search engines

Probably the second most commonly performed web activity is internet searching; emailing is the most commonly performed activity on the internet. Because internet searching is so common, websites are likely to get most of their traffic from search referrals than any other medium. Studies estimate that 80% of the website traffic comes from search engines. This website, for instance, gets around 60% to 68% of the traffic from search referrals. Given these statistics, it is convincing for website professionals to consider the significance of search engine optimization (SEO). SEO is an important field to any modern business as it potentially may have the highest return on investment of promoting and marketing a website.

What is Search Engine Optimization?

Search Engine Optimization is a process of maximizing search rankings and web directory listings. The process involves appropriate and relevant use of the followings: keywords, description, text content, and links. It is not difficult to find a website that promises to increase website rankings with their products or services they offer. Be cautious of using such products or services as their use may actually have your website banned from search engines or have negative impacts on your website rankings.

Effective SEO means adhering to best practices in different fields and cannot be achieved alone by the use of a single product or service. Although specialized commercial tools are available for serious search engine optimization, simply following search engine optimization guidelines is a great starting point for any website needing any search engine optimization attention. As an example of search engine guidelines, see Google’s Webmaster Guidelines.

Why is it important to be in top search results on a search engine website? As mentioned before, search engines are important drivers in bringing more traffic to a website. Actually, search engines drive traffic to a website if it ends up in top search results. Interestingly, studies show that only about 7% of the searchers go beyond the third page of search results. So if your website is listed on page 4 or higher, chances of a searcher finding a website on such pages is very low. Keep in mind that as number of pages indexed by major search engines is increasing and has exceeded 10 billion, ranking within the top 30 results can only be achieved with patience and careful planning and execution of a decent search engine optimization strategy.

What factors are important to optimizing searches on search engines?

There are number of websites that are just devoted to the subject. Here we will discuss six fundamental factors that influence any successful search engine optimization effort:

keywords — selection of relevant and correct keywords to a web page is probably the most critical aspect of search engine optimizing. When selecting keywords for a page, think about what words or phrases the searchers will be typing to search the page you are designing. As an example, relevant keywords for this page could be: search engine optimization, SEO, search engine optimizing, improving search results, search engines, etc. Examples of irrelevant keywords for this page will be cars, cell phones, books, weather, etc. Also, remember to keep the list of keywords short and avoid repeating words.
title — many search engines consider the title of the page to be very significant in determining the search page rankings. Because of that, make sure to include important keyword(s) in the title of your web page. So appropriate title for this page is search engine optimization since that is the subject of this page. When deciding on the title of the page, make sure it is concise and appropriate for your webpage.
description — page description is also an important parameter for determining search rankings of a page. The page description should also be concise and relevant to the web page. Think of the description as a short summary of the web page. Remember to include important keywords in your page description. An example of page description for this web page is: this page discusses search engine optimization and the general principles behind optimizing pages for search engines.
text content — this is also probably the most determining factor of how a search engine will rate a web page. It does not matter how good your keywords, description, title, or links you have to a page if it does not have any content. Text content of the page is what determines what search phrases to use for keywords, what description to use, what title to select, etc. Having a lot of contextual content helps page ranking. Conversely, having no or irrelevant content hurts page rankings.
links — incoming links to a webpage help increase the ranking of the web page. In principle, as more and more pages link to a particular webpage, the more significant the page becomes. Think of incoming links as votes. More votes a page has, the more chances the page has in increasing its search rankings. However, relevancy of the page casting the vote is probably more important than the quantity of the votes a single page receives. In other words, it is desirable to have 2 relevant websites link to this page than to have 500 irrelevant or inappropriate websites linking to this page.
file names — the file name of the web page is also considered an important factor in determining the placement of a website within search results. Try to use a file name that is appropriate for the web page. Our advise is to use the page title as the file name, of course, without space characters. You may separate multiple words of a page file name with a hyphen (-) or underscore or horizontal bar (-). Appropriate file name for this page is search-engine-optimization.asp, for example.

Document caching with <meta> tag

Because caching involves keeping a local copy of a web page or embedded objects (such as images or media files) on a proxy server or local disk drive, it can help reduce redudent network traffic by avoiding fetching of a new copy of the document from the website. Caching can be useful when

the content of the site does not change very often, and
many vistors access the website or web page that does not change very often.

If you think about it, you would not want to fetch the same page over and over particularly when its content does not change very often. Thus by avoidign to fetch the same page over and over, you can reduce network traffic.

Caching may sound like a good idea and it may be used too aggresively. If pages are cached too agressively, then, users may inadvertanly view old content.