Showing posts with label Technical SEO. Show all posts
Showing posts with label Technical SEO. Show all posts

October 20, 2024

Robots.txt Guide for SEO

Optimized Robots.txt strategy improves SEO. Blocking unnecessary URLs is one of the most critical steps in this strategy.

Robotx.txt plays an essential role in SEO strategy. Beginners tend to make mistakes when they do not understand the use of Robots.txt on websites.

It is responsible for your website’s crawlability and indexability.

An optimized Robots.txt file can significantly improve your website’s crawling and indexing.

Google also told us to use Robots.txt to block action URLs such as login, signup, checkout, add-to-cark, etc.

Robots.txt Guide for SEO: eAskme

But how to do it the right way.

Here is everything!

What is Robots.txt?

The robots.txt file is a code that you place in your website’s root folder. It is responsible for allowing crawlers to crawl your website.

Robots.txt contains 4 critical directives:

  1. User-agent: It tells that if you allow every crawler or a few targeted crawlers.
  2. Disallow: Pages you do not want search engines to crawl.
  3. Allow: Pages or part of the website that you want to allow for crawling.
  4. Sitemap: your XML sitemap link.

Robots.txt file is case sensitive.

Robots.txt Hierarchy:

Robots.txt should be in an optimized format.
The most common robots.txt order is as follows:

  1. User-agent: *
  2. Disallow: /login/
  3. Allow: /login/registration/

The first line allows search engines to crawl everything.

The second line disallows search bots from crawling login pages or URLs.

The third line allows the registration page to be crawled.

Simple Robots.txt rule:

User-agent: *
Disallow: /login/
Allow: /login/

In this format, the search engine will access the Login URL.

Importance of Robots.txt:

Robots.txt helps optimize your crawl budget. When you block unimportant pages, Googlebot spends its crawl budget only on relevant pages.

Search engines prefer an optimized crawl budget. Robotx.txt makes it possible.

For example, you may have an eCommerce website where check-in, add-to-cart, filter, and category pages do not offer unique value. It is often considered as duplicate content. It is best to save your crawl budget on such pages.

Robots.txt is the best tool for this job.

When You Must Use Robots.txt?

It is always necessary to use Robots.txt on your website.

  • Block unnecessary URLs such as categories, filters, internal search, cart, etc.
  • Block private pages.
  • Block JavaScript.
  • Block AI Chatbots and content scrapers.

How to Use Robots.txt to Block Specific Pages?

Block Internal Search Results:

You want to avoid indexing your internal search results. It is pretty easy to block action URLs.

Just go to your robotx.txt file and add the following code:

Disallow: *s=*

This line will disallow search engines from crawling internal search URLs.

Block Custom Navigation:

Custom navigation is a feature that you add to your website for users.

Most e-commerce websites allow users to create “Favorite” lists, which are displayed as navigation in the sidebar.

Users can also create Faceted navigation using sorted lists.

Just go to your robotx.txt file and add the following code:

Disallow: *sortby=*
Disallow: *favorite=*
Disallow: *color=*
Disallow: *price=*

Block Doc/PDF URLs:

Some websites upload documents in PDF or .doc formats.

You do not want them to be crawled by Google.

Here is the code to block doc/pdf URLs:

Disallow: /*.pdf$
Disallow: /*.doc$

Block a Website Directory:

You can also block website directories such as forms.

Add this code to block users, forms, and chats from your Robots.txt file:

Disallow: /form/

Block User Accounts:

You do not want to index user pages in search results.

Add this code in Robots.txt:

Disallow: /myaccount/

Block Irrelevant JavaScript:

Add a simple line of code to block non-relevant JavaScript files.

Disallow: /assets/js/pixels.js

Block Scrapers and AI Chatbots:

The Google.com/robots.txt file says that you should block AI chatbots and scrapers.

Add this code to your Robots.txt file:

#ai chatbots
User-agent: anthropic-ai
User-agent: Applebot-Extended
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: cohere-ai
User-agent: Diffbot
User-agent: FacebookBot
User-agent: GPTBot
User-agent: ImagesiftBot
User-agent: Meta-ExternalAgent
User-agent: Meta-ExternalFetcher
User-agent: Omgilibot
User-agent: PerplexityBot
User-agent: Timpibot
Disallow: /

To block scrapers, add this code:

#scrapers
User-agent: magpie-crawler
User-Agent: omgilibot
User-agent: Node/simplecrawler
User-agent: Scrapy
User-agent: CCBot
User-Agent: omgili
Disallow: /

Allow Sitemap URLs:

Add sitemap URLs to be crawled using robots.txt.

  • Sitemap: https://www.newexample.com/sitemap/articlesurl.xml
  • Sitemap: https://www.newexample.com/sitemap/newsurl.xml
  • Sitemap: https://www.newexample.com/sitemap/videourl.xml

Crawl Delay:

Crawl-delay works only for some search bots other than Google. You can set it to tell the bot to crawl the next page after a specific number of seconds.

Google Search Console Robots.txt Validator

  • Go to Google Search Console.
  • Click on “Settings.”
  • Go to “robots.txt.”
  • Click on “Request to Crawl.”

It will crawl and validate your robots.txt file.

Conclusion:

Robots.txt is an important tool for optimizing the crawl budget. It impacts your website’s crawlability, which in turn impacts the indexing in search results.

Block unnecessary pages to allow Googlebot to spend time on valuable pages.

Save resources with optimized robots.txt file.

Other People Are reading:

>

July 17, 2024

Website Crawling: What, Why and How To Optimize It?

 
Website crawling depends upon many things, such as website structure, internal linking, sitemap, etc.

It is important to ensure that Googlebot and other search engine bots can easily crawl your website.

Without crawling your website content, Google cannot find and index the pages.

Optimize website crawling to expand your content reach to search engines.

Here is what you must know about website crawling.

What is Crawling in SEO?

Website Crawling, What, Why and How To Optimize It: eAskme
Website Crawling, What, Why and How To Optimize It: eAskme

In SEO, crawling means letting search engine bots discover your content.

Ensure search engine bots can access your website content, such as videos, text, images, links, etc.

How Search Engine Web Crawlers Work?

Search engine crawlers can discover page content and links and download your webpage content.

After crawling the content, search engine bots send the crawled content to the search index library. Search engines also extract links to web pages.

Crawled links can be in different categories, such as;

  • New URLs
  • Pages without guidance to crawl.
  • Updated URls
  • Not-updated URLs
  • Disallowed URLs
  • Inaccessible URLs

Crawled URLs will be listed in the crawl queue and assigned priorities.

Search engines assign priority based on many factors.

Search engines have created their algorithms to crawl and index website content.

You should know that popular search engine bots such as Googlebot, Yahoo Slurp, Yandex Bot, DuckDuckGo, Bingbot, etc., work differently.

Why Should Every Webpage Be Crawled?

If a page is not crawled, it will never get indexed in SERP. It is necessary to let search engines quickly crawl website pages as soon as you make any changes or publish new posts.

The latest posts will be irrelevant if not crawled quickly.

Crawl Efficiency Vs. Crawl Budget:

Google search bots will not crawl and index your entire website.

100% crawling is not what always happens. Most of the massive sites face crawling issues.

You will find all not index links under “Discovered - Currently not indexed” in the Google search console report.

You will face some crawling issues even if you do not see any page under this section yet.

Crawl Budget:

The crawl budget refers to the number of pages Googlebot wants to crawl in a specific period.

You can check the crawl requests in “Google Search Console.”

Here you should understand that increasing the number of crawls does not mean that all the pages are getting crawled. It is better to improve the quality of crawling.

Crawl Efficiency:

Crawl efficiency is the delay between publishing a page or update and getting that page or update crawled.

Crawl optimization can make a bigger impact on your website ranking.

Search Engine Support for Crawling:

Best crawling practices help search engines rank optimized pages and reduce the greenhouse effect of running search engines.

SEOs are talking about two APIs that can improve search crawling, such as:

  • Non-Google Support from IndexNow
  • Google Support from The Indexing API

These APIs push your content to search engine crawlers for quick crawling.

Non-Google Support from IndexNow:

IndexNow API is one of the most popular APIs Bing, Seznam and Yandex use for quick indexing. Right now, Google is not favoring IndexNOW API.

Now only search engines but CDNs, CRMS, and SEO tools also use IndexNow API to quickly index pages.

If your audience is not from search engines, then you may not find massive benefits with IndexNow.

You should also know that IndexNow will add additional load to your server. Understand if you can bear the cost of IndexNow to improve crawl efficiency.

Google Support the Google Indexing API:

Google Indexing API is for those who want to improve Google crawl efficiency.

Google has said that Indexing API is only for the event and job posting markups. But webmasters have found that it can also help improve search efficiency for other pages.

Here you should understand that crawling is not indexing. Google crawls your page, and if it is non-compliant, it will not index.

Manual Submission in the Google Search Console Support:

You can submit your URLs manually in the Google search console.

But you should only submit 10 URLs within 24 hours. You can also use third-party apps or scripts for automatic submissions.

How to Create Efficient Website Crawling?

Server Performance:

Always host your website on a reliable and fast server. Your site host status should display as green.

Get rid of meaningless content:

Remove outdated and low-quality posts to improve crawl efficiency. This will help you in fixing the index bloat issue.

Go to the “Crawled – Currently not Indexed” section and fix the issues for 404 pages and use 301 redirects.

When to use Noindex:

Use Noindex and rel=canonical tags to clean your Google search index report. You can even use robots.txt to disallow pages you do not want search engines to crawl.

Block non-SEO URLs such as parameter pages, functional pages, spaces, API Urls, and useless styles, scripts, and images.

Fix pagination issues to improve crawling.

Optimize Sitemap:

Use XML sitemap and optimize it for better crawling.

Internal Linking:

Internal links can easily scale crawl efficiency.

Use breadcrumbs, pagination, links, and filters to connect pages without scripts.

Conclusion:

Website crawling is important for the success of an online website or business. It is also the basic of SEO.

Optimize your web crawling performance and fix issues to improve crawl efficiency.

Still have any question, do share via comments.

Share this post with your friends and family.

Don't forget to like us FB and join the eAskme newsletter to stay tuned with us.

Other handpicked guides for you;

>

September 03, 2023

Google Pagespeed Insights Lighthouse 11 Update: What is There for You?

Google has launched an update for speed inside; this time, it is updating Lighthouse 11 with new features.

Google Pagespeed Insights Lighthouse 11 is the latest update of the newest version of the Google Lighthouse.

You can check your website speed, fix buzzes, and get a score.

What is Google Pagespeed Insights Lighthouse 11?

Google Pagespeed Insights Lighthouse 11 Update: eAskme
Google Pagespeed Insights Lighthouse 11 Update: eAskme

Google Lighthouse is a free-to-use tool to measure your website performance issues and find ways to fix them.

It is a free tool that anyone can use. All you need is to type your website URL or webpage URL and hit Enter to get a Google page speed insights report with the core web vital scores.

Google page speed insights lighthouse 11 update launched on August 28, 2023.

It is available for every user who wants to test his website.

What has changed in Lighthouse 11 or Google Pagespeed Insights?

Here are the updates and changes that you can find out;

  • Updated accessibility audits.
  • Updated best practices score.
  • Fixed a bug related to Largest Contentful Paint.
  • Updated INP or interaction to Next Paint reports.
  • Fix many bugs to improve reports.

Updated interaction to next paint (INP) report:

Google has introduced the Interaction to Next Paint metric to improve the interactivity of the webpages.

The search engine giant has also said the INP report will become official in March 2024.

Now, Google has also removed the INP metric from the experimental stage.

The recent change in Lighthouse 11 displays that Google is serious about the INP metric.

Bug fixed in Largest Contentful Paint (LCP):

A bug was reported with the launch of Lighthouse 10.2.0. In Lighthouse 11, this bug is fixed.

13 new accessibility audits:

  • td-has-header
  • table-fake-caption
  • table–duplicate-name
  • skip-link
  • select-name
  • link-in-text-block
  • label-content-name-mismatch
  • input-button-name
  • image-redundant-alt
  • html-xml-lang-mismatch
  • aria-text
  • aria-dialog-name
  • aria-allowed-role

Conclusion:

Google Lighthouse 11 is an essential and official update for Google Pagespeed Insights. You will get a better report to check your website performance and when your bugs are fixed.

Check your website performance with Google Lighthouse 11 and fix issues to make it user-friendly. It is also essential for the overall SEO of your website.

Share this post with your friends and family.

Don't forget to like us FB and join the eAskme newsletter to stay tuned with us.

Other handpicked guides for you;

>