Need Proxy?

BotProxy: Rotating Proxies Made for professionals. Really fast connection. Built-in IP rotation. Fresh IPs every day.

Find out more


(Python) Scrapy - How to scrape a JS dropdown list?

Question

I want to scrape the javascript list of the 'size' section of this address:

http://store.nike.com/us/en_us/pd/magista-opus-ii-tech-craft-2-mens-firm-ground-soccer-cleat/pid-11229710/pgid-11918119

What I want to do is get the sizes that are in stock, it will return a list. How would I be able to do it?

Here's my full code:

# -*- coding: utf-8 -*-
from scrapy import Spider
from scrapy.http import Request

class ShoesSpider(Spider):
    name = "shoes"
    allowed_domains = ["store.nike.com"]
    start_urls = ['http://store.nike.com/us/en_us/pd/magista-opus-ii-tech-craft-2-mens-firm-ground-soccer-cleat/pid-11229710/pgid-11918119']

    def parse(self, response):       
        shoes = response.xpath('//*[@class="grid-item-image-wrapper sprite-sheet sprite-index-0"]/a/@href').extract()
        for shoe in shoes:
            yield Request(shoe, callback=self.parse_shoes) 

    def parse_shoes(self, response):
        name = response.xpath('//*[@itemprop="name"]/text()').extract_first()
        price = response.xpath('//*[@itemprop="price"]/text()').extract_first()
        #sizes = ??

        yield {
            'name' : name,
            'price' : price,
            'sizes' : sizes
        }

Thanks

Answer

Here is the code to extract sizes in stock.

import scrapy


class ShoesSpider(scrapy.Spider):
    name = "shoes"
    allowed_domains = ["store.nike.com"]
    start_urls = ['http://store.nike.com/us/en_us/pd/magista-opus-ii-tech-craft-2-mens-firm-ground-soccer-cleat/pid-11229710/pgid-11918119']

    def parse(self, response):
        sizes = response.xpath('//*[@class="nsg-form--drop-down exp-pdp-size-dropdown exp-pdp-dropdown two-column-dropdown"]/option')


        for s in sizes:
            size = s.xpath('text()[not(parent::option/@class="exp-pdp-size-not-in-stock selectBox-disabled")]').extract_first('').strip()
            yield{'Size':size}


Here is the result:

M 4 / W 5.5
M 4.5 / W 6
M 6.5 / W 8
M 7 / W 8.5
M 7.5 / W 9
M 8 / W 9.5
M 8.5 / W 10
M 9 / W 10.5

In the for loop, if we write it like this, it will extract all the sizes, whether they are in stock or not.

size = s.xpath('text()').extract_first('').strip()


But if you want to get those that are in stock only, they are marked with the class "exp-pdp-size-not-in-stock selectBox-disabled" which you have to exclude through adding this:

[not(parent::option/@class="exp-pdp-size-not-in-stock selectBox-disabled")]



I have tested it on other shoe pages, and it works as well.

cc by-sa 3.0