Search crawl logs play a crucial role while troubleshooting search issues in SharePoint Server 2013. But its very difficult for a SharePoint Administrator to go through the logs, 50 at a time and navigate between different sets of logs to understand the issue and observe a pattern for the errors or warnings. Its very easy to sort and filter the logs if we have all the details in an excel. The UI of Search Service Application doesn't provide any interface for exporting the logs, but the logs can be retrieved from the Search Service Application using the 'Microsoft.Office.Server.Search.Administration.CrawlLog' class. This class has a method called 'GetCrawledUrls' which can retrieve the logs. The 2013 documentation says that this method is obsolete, so be cautious if you are planning to use this script for SharePoint Server 2016 or later.
PowerShell Script to export Crawl Logs
Below is a PowerShell script which retrieves the crawl logs and exports them to an excel.
$errorsFileName = "D:\Ram\CrawlLogs.csv"
$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "Search Service Application"
$logs = New-Object Microsoft.Office.Server.Search.Administration.CrawlLog $ssa
$logs.GetCrawledUrls($false,10000,"",$false,1,2,-1,[System.DateTime]::MinValue,[System.DateTime]::MaxValue) | export-csv -notype $errorsFileName
Explaining the method's Parameters
The 'GetCrawledUrls' method has the following parameters
GetCrawledUrls(bool getCountOnly,long maxRows,string urlQueryString,bool isLike,int contentSourceID,int errorLevel,int errorID,DateTime startDateTime,DateTime endDateTime)
- Return Value - DataTable
- getCountOnly - If true, returns only the count of URLs matching the given parameters.
- maxRows - This parameter specifies the number of rows to be retrieved.
- urlQueryString - The prefix value to be used for matching the URLs
- isLike - If true, all URLs that start with 'urlQueryString' will be returned.
- contentSourceID - This is the ID of the content source for which crawl logs should be retrieved. If -1 is specified, URLs will not be filtered by content source. How to get the Content Source ID?
- errorLevel - Only URLs with the specified error level will be returned.Possible Values -
-1 : Do not filter by error level.
0 : Return only successfully crawled URLs.
1 : Return URLs that generated a warning when crawled.
2 : Return URLs that generated an error when crawled.
3 : Return URLs that have been deleted.
4 : Return URLs that generated a top level error.
- errorID - Only URLs with this error ID will be returned. If -1 is supplied, URLs will not be filtered by error ID.
- startDateTime - Start Date Time. Logs after this date are retrieved.
- endDateTime - End Date Time. Logs till this date are retrieved.
Get Content Source ID
- Open the Search Service Application.
- Click on the Content Sources from the left navigation menu.
- On this page, all the Content Sources are shown. Click on your required Content Source and check the URL in the address bar. The value of the query string parameter cid is the content source id.
Based on this PowerShell snippet, we can even write a small utility which will send a daily mail with the list of errors occurred in the previous days crawling. Check my next post for details on how to do this.