Recently I observed the following error in our SharePoint Farm's crawl log for multiple items in the site collections.
The content processing pipeline failed to process the item. ( Index was out of range. Must be non-negative and less than the size of the collection. Parameter name: index; ; SearchID = XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXXX)
Root Cause
After some troubleshooting I found that this crawl error occurs due to some erroneous settings of Managed Properties in your Search Schema and the Managed Metadata Site Columns. When the setting for ‘Allow Multiple Values’ is different between a Managed Metadata Site Column and its appropriate Managed Property/ies, this error occurs and the SharePoint items will not get crawled.
Solution
Make sure the value for ‘Allow Multiple Values’ is set as same for the Managed Metadata Site Column and the Managed Property (which is having the Crawled property of the site column added). Both should be either set to Yes/No.
Make these changes and run a full crawl. For some the issue might be solved, but if the issue is still not resolved, check the ULS logs and you might find an error message as shown below
[Microsoft.CrawlerFlow-XXXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX] Microsoft.Ceres.ContentEngine.Processing.BuiltIn.AttributeMapperEvaluator+AttributeMapperProducer : Failed to map values to field ExcludeFromSummary
If you see this in the log, then your ‘ExcludeFromSummary’ Managed Property is having an erroneous setting. Even though your site column has ‘Allow Multiple Values’ set to Yes, the ‘ExcludeFromSummary’ managed property should have ‘Allow Multiple Values’ set to NO.
Make sure the ‘ExcludeFromSummary’ Managed Property has the correct settings as shown above and run a full crawl to solve this error. Now all your items in the Site Collection/s should be crawled successfully.