A cikin wannan labarin, za mu bincika mahimmin rawar fayil ɗin robots.txt wajen sarrafa zirga-zirga akan gidajen yanar gizo, tattauna wajibcin kasancewarsa, da kuma ba da shawarwari don saita shi don ingantaccen sarrafa firikwensin shafi. Bugu da ƙari, za mu bincika misalan daidaitattun umarnin da aka yi amfani da su a cikin fayil ɗin robots.txt kuma mu ba da jagora kan yadda ake bincika daidaitattun saitunan sa.
Me yasa Ana Bukatar Robots.txt
Robots.txt fayil ne da ke kan uwar garken rukunin yanar gizon a cikin tushen littafinsa. Yana sanar da mutummutumin injin bincike yadda yakamata su bincika abubuwan da ke cikin albarkatun. Yin amfani da wannan fayil ɗin daidai yana taimakawa hana ƙididdige shafukan da ba'a so, yana kare bayanan sirri, kuma yana iya inganta ingantaccen haɓaka SEO da ganuwa na shafin a cikin sakamakon bincike. Ana yin tsari na robots.txt ta hanyar umarni, wanda zamu duba gaba.
Saitin Umarni a cikin Robots.txt
Mai amfani
An san umarnin farko da Mai amfani-Agent, inda muka saita kalma ta musamman don mutummutumi. Bayan gano wannan kalmar, mutum-mutumi ya fahimci cewa an yi nufin ƙa'idar musamman don ta.
Yi la'akari da misalin amfani da Wakilin Mai amfani a cikin fayil ɗin robots.txt:
User-Agent: *
Disallow: /private/
Wannan misalin yana nuna cewa duk wani mutum-mutumi na bincike (wakilta ta alamar"*") yakamata ayi watsi da shafukan da ke cikin /na sirri/ directory.
Ga yadda umarnin ke neman takamaiman mutum-mutumin bincike:
User-Agent: Googlebot
Disallow: /admin/
User-Agent: Bingbot
Disallow: /private/
A wannan yanayin, da Googlebot Neman mutum-mutumi ya kamata yayi watsi da shafuka a cikin / admin/ directory, yayin da Bingbot kamata yayi watsi da shafuka a cikin /na sirri/ directory.
A hana
A hana yana gaya wa mutum-mutumin bincike wanda URLs za su tsallake ko a'a a cikin gidan yanar gizon. Wannan umarnin yana da amfani lokacin da kake son ɓoye bayanai masu mahimmanci ko ƙananan shafukan abun ciki daga ƙididdige su ta hanyar injunan bincike. Idan fayil ɗin robots.txt ya ƙunshi shigarwar An haramta: /directory/, sannan za a hana mutum-mutumin damar shiga abubuwan da ke cikin ƙayyadaddun kundin adireshi. Misali,
User-agent: *
Disallow: /admin/
Wannan darajar tana nuna cewa duk robots yakamata ayi watsi da URLs farawa da / admin/. Don toshe gabaɗayan rukunin yanar gizon daga yin fihirisa ta kowane mutummutumi, saita tushen littafin a matsayin doka:
User-agent: *
Disallow: /
Bada
Ƙimar "Bada" tana aiki sabanin "Kin yarda": yana ba da damar bincike-bincike na mutum-mutumi zuwa wani takamaiman shafi ko kundin adireshi, ko da wasu umarni a cikin fayil ɗin robots.txt sun hana shiga cikinsa.
Yi la'akari da misali:
User-agent: *
Disallow: /admin/
Allow: /admin/login.html
A cikin wannan misalin, an ƙayyadad da cewa ba a ba da izinin amfani da mutum-mutumi ba / admin/ directory, sai dai /admin/login.html shafi, wanda akwai don yin fihirisa da dubawa.
Robots.txt da Taswirar Yanar Gizo
Taswirar Yanar Gizo fayil ne na XML wanda ke ƙunshe da jerin URLs na duk shafuka da fayiloli akan rukunin yanar gizon waɗanda injunan bincike za su iya ba da lissafi. Lokacin da mutum-mutumin bincike ya shiga fayil ɗin robots.txt kuma ya ga hanyar haɗi zuwa fayil ɗin XML na taswira, zai iya amfani da wannan fayil ɗin don nemo duk URLs da albarkatun da ke kan rukunin yanar gizon. An ƙayyade umarnin a cikin tsari:
Sitemap: https://yoursite.com/filesitemap.xml
Yawancin lokaci ana sanya wannan doka a ƙarshen takaddar ba tare da an ɗaure ta da takamaiman mai amfani ba kuma duk robots suna sarrafa su ba tare da togiya ba. Idan mai gidan baya amfani da sitemap.xml, ba lallai ba ne a ƙara ƙa'idar.
Misalai na Robots.txt
Saita Robots.txt don WordPress
A cikin wannan sashe, za mu yi la'akari da shirye-shiryen da aka yi don WordPress. Za mu bincika toshe damar yin amfani da bayanan sirri da ba da damar shiga manyan shafuka.
A matsayin ingantaccen bayani, zaku iya amfani da lambar mai zuwa:
User-agent: *
# Block access to files containing confidential data
Disallow: /cgi-bin
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/themes/
Disallow: /wp-login.php
Disallow: /wp-register.php
Disallow: /xmlrpc.php
# Allow access to the main site pages
Allow: /wp-content/uploads/
Allow: /sitemap.xml
Allow: /feed/
Allow: /trackback/
Allow: /comments/feed/
Allow: /category/*/*
Allow: /tag/*
# Prohibit the indexing of old versions of posts and parameterized queries to avoid content duplication or suboptimal indexing.
Disallow: /*?*
Disallow: /?s=*
Disallow: /?p=*
Disallow: /?page_id=*
Disallow: /?cat=*
Disallow: /?tag=*
# Include the sitemap (location needs to be replaced with your own)
Sitemap: http://yourdomain.com/sitemap.xml
Kodayake duk umarnin suna tare da tsokaci, bari mu zurfafa zurfafa cikin ƙarshe.
- Robots ba za su yi lissafin manyan fayiloli da kundayen adireshi ba.
- A lokaci guda, ana ba da damar mutum-mutumin shiga manyan shafuka da albarkatun rukunin yanar gizon.
- an saita haramcin akan fidda tsoffin juzu'in sakonni da kuma madaidaitan tambayoyin don hana kwafin abun ciki.
- An nuna wurin taswirar rukunin yanar gizon don ingantattun firikwensin.
Don haka, mun yi la'akari da misali na gabaɗaya na shirye-shiryen daidaitawa, wanda wasu fayiloli masu mahimmanci da hanyoyi ke ɓoye daga firikwensin bayanai, amma ana samun dama ga manyan kundayen adireshi.
Ba kamar yawancin shahararrun CMS ko rukunin yanar gizo na al'ada ba, WordPress yana da plugins da yawa waɗanda ke sauƙaƙe ƙirƙira da sarrafa fayil ɗin robots.txt. Daya daga cikin shahararrun hanyoyin magance wannan dalili shine Yoast WANNAN.
Don shigar da shi, kuna buƙatar:
- Je zuwa rukunin gudanarwa na WordPress.
- A cikin "Plugins" sashe, zaɓi "Ƙara Sabo".
- Nemo plugin ɗin "Yoast SEO" kuma shigar da shi.
- Kunna plugin.
Don shirya fayil ɗin robots.txt, kuna buƙatar:
- Je zuwa sashin "SEO" a cikin menu na gefen panel kuma zaɓi "General".
- Je zuwa shafin "Kayan aiki".
- Danna "Files". Anan zaku ga fayiloli daban-daban, gami da robots.txt.
- Shigar da ƙa'idodin ƙididdigewa bisa ga buƙatun ku.
- Bayan yin canje-canje ga fayil ɗin, danna maɓallin "Ajiye canje-canje zuwa robots.txt".
Lura cewa kowane saitin fayil na robots.txt don WordPress na musamman ne kuma ya dogara da takamaiman buƙatu da fasalulluka na rukunin yanar gizon. Babu samfuri na duniya wanda zai dace da duk albarkatun ba tare da togiya ba. Koyaya, wannan misali da amfani da plugins na iya sauƙaƙe aikin sosai.
Saitin Manual na Robots.txt
Hakazalika, zaku iya saita tsarin ku na fayil ko da in babu shirye-shiryen CMS don rukunin yanar gizon. Har ila yau, mai amfani yana buƙatar loda fayil ɗin robots.txt zuwa tushen adireshin rukunin yanar gizon kuma ya ƙididdige ƙa'idodin da suka dace. Ga ɗaya daga cikin misalan, wanda a cikinsa aka nuna duk umarnin da ake da su:
User-agent: *
Disallow: /admin/ # Prohibit access to the administrative panel
Disallow: /secret.html # Prohibit access to a specific file
Disallow: /*.pdf$ # Prohibit indexing of certain file types
Disallow: /*?sort= # Prohibit indexing of certain URL parameters
Allow: /public/ # Allow access to public pages
Sitemap: http://yourdomain.com/sitemap.xml # Include the sitemap
Yadda ake Duba Fayil na Robots.txt
A matsayin kayan aiki na taimako lokacin duba fayil ɗin robots.txt don kurakurai, ana ba da shawarar yin amfani da sabis na kan layi.
Yi la'akari da misali na Yandex Webmaster hidima. Don bincika, kuna buƙatar saka hanyar haɗi zuwa rukunin yanar gizonku a cikin filin da ya dace idan an riga an ɗora fayil ɗin zuwa uwar garken. Bayan haka, kayan aiki da kansa zai ɗora saitunan fayil ɗin. Hakanan akwai zaɓi don shigar da saitin da hannu:
Na gaba, kuna buƙatar neman rajistan ku kuma jira sakamakon:
A cikin misalin da aka bayar, babu kurakurai. Idan akwai wasu, sabis ɗin zai nuna wuraren da ke da matsala da hanyoyin gyara su.
Kammalawa
A taƙaice, mun jaddada muhimmancin fayil ɗin robots.txt don sarrafa zirga-zirga a kan rukunin yanar gizon. Mun ba da shawara kan yadda za a saita shi da kyau don sarrafa yadda injunan bincike ke ba da fihirisa shafukan. Baya ga wannan, mun kuma kalli misalan yadda ake amfani da wannan fayil daidai kuma mun ba da umarnin yadda ake bincika cewa duk saitunan suna aiki daidai.