Spider Not Returning All Results After Changing My Item Pipelines To An If And Elif Statement
Solution 1:
I managed to reproduce the issue when testing with Sqlite3 and the number of errors in the scrapy log corresponded to the missing entries. These errors were caused by unescaped single quotes in the BristolQualification
item field (and presumably the Bath spider suffers from the same problem) causing havoc (such as d'Etudes
in the snippet below):
Candidates holding a Dipl\xf4me de Technicien Sup\xe9rieur / Sciences Appliqu\xe9es with suitable grades or those with the Dipl\xf4me d'Etudes Universitaires G\xe9n\xe9rales (DEUG) with good grades in suitable subjects will be considered for appropriate undergraduate courses.
I managed to get it working (at least with SQLite3) by breaking up the join and encoding of the qualification item field. The code below should work, but please note that it is untested with MySQL. If any errors occur, then check the scrapy log errors and let me know if there are any problems.
defprocess_item(self, item, spider):
try:
if'BristolQualification'in item:
qualification = ''.join(s for s in item['BristolQualification'])
qualification.encode('utf8')
self.cursor.execute("INSERT INTO Bristol(BristolCountry, BristolQualification) VALUES (?, ?)", (item['BristolCountry'], qualification))
elif'BathQualification'in item:
qualification = ''.join(s for s in item['BathQualification'])
qualification.encode('utf8')
self.cursor.execute("INSERT INTO Bath(BathCountry, BathQualification) VALUES (?, ?)", (item['BathCountry'], qualification))
self.conn.commit()
return item
except MySQLdb.Error as e:
print"Error %d: %s" % (e.args[0], e.args[1])
Post a Comment for "Spider Not Returning All Results After Changing My Item Pipelines To An If And Elif Statement"