Skip to content
Fix Code Error

Scraping text after a span in with Regex (and Requests)

July 2, 2021 by Code Error
Posted By: Anonymous

I have an unformatted and messy bs4.BeautifulSoup element from a webpage. The soup looks like this.

soup = ' </span><span class="productConfiguration__shippingDateEnd">Jul 30, 2021</span>"},{"id":"50014999","description":null,"displayValue":"M","value":"M","selected":false,"selectable":true,"url":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=M&pid=2947&quantity=1","hasComingSoon":true,"hasAllComingSoonAttr":true,"configurationUrl":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=M&dwvar_2947_pv_rahmenfarbe=YE%2fBK","sizeMin":178,"sizeMax":184,"measurementInterval":"178 cm - 184 cm","comingSoonReason":"productOrPreferenceInstockDate","comingSoon":true,"availability":{"messages":["Back order"],"inStockDate":"2021-08-09T00:00:00.000Z","onlyXLeftNumber":122,"onlyXLeft":false,"lowStock":false,"shippingInfo":"Coming in August 2021","available":false,"availableSufficient":true,"notifyMe":true,"showOutOfStock":false,"similarBikes":false,"comingSoonByBackOrderAllocation":false},"hasSuccessorProduct":false,"comingSoonMessage":"Coming in August 2021"},
{"id":"50015000","description":null,"displayValue":"L","value":"L","selected":false,"selectable":true,"url":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=L&pid=2947&quantity=1","hasComingSoon":true,"hasAllComingSoonAttr":true,"configurationUrl":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=L&dwvar_2947_pv_rahmenfarbe=YE%2fBK","sizeMin":184,"sizeMax":190,"measurementInterval":"184 cm - 190 cm","comingSoonReason":"productOrPreferenceInstockDate","comingSoon":true,"availability":{"messages":["Back order"],"inStockDate":"2021-08-16T00:00:00.000Z","onlyXLeftNumber":96,"onlyXLeft":false,"lowStock":false,"shippingInfo":"Coming in August 2021","available":false,"availableSufficient":true,"notifyMe":true,"showOutOfStock":false,"similarBikes":false,"comingSoonByBackOrderAllocation":false},"hasSuccessorProduct":false,"comingSoonMessage":"Coming in August 2021"},
{"id":"50015001","description":null,"displayValue":"XL","value":"XL","selected":false,"selectable":true,"url":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=XL&pid=2947&quantity=1","hasComingSoon":true,"hasAllComingSoonAttr":true,"configurationUrl":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=XL&dwvar_2947_pv_rahmenfarbe=YE%2fBK","sizeMin":190,"sizeMax":196,"measurementInterval":"190 cm - 196 cm","comingSoonReason":"productOrPreferenceInstockDate","comingSoon":true,"availability":{"messages":["Back order"],"inStockDate":"2021-08-09T00:00:00.000Z","onlyXLeftNumber":38,"onlyXLeft":false,"lowStock":false,"shippingInfo":"Coming in August 2021","available":false,"availableSufficient":true,"notifyMe":true,"showOutOfStock":false,"similarBikes":false,"comingSoonByBackOrderAllocation":false},"hasSuccessorProduct":false,"comingSoonMessage":"Coming in August 2021"},
{"id":"50015002","description":null,"displayValue":"2XL","value":"2XL","selected":false,"selectable":true,"url":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=2XL&pid=2947&quantity=1","hasComingSoon":false,"hasAllComingSoonAttr":false,"configurationUrl":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=2XL&dwvar_2947_pv_rahmenfarbe=YE%2fBK","sizeMin":196,"sizeMax":999,"measurementInterval":"> 196 cm","comingSoonReason":"","comingSoon":false,"availability":{"messages":["Back order"],"inStockDate":"2021-07-26T00:00:00.000Z","onlyXLeftNumber":10,"onlyXLeft":false,"lowStock":false,"shippingInfo":"Shipping <span class="productConfiguration__shippingDate">Jul 26, 2021</span><span class="productConfiguration__shippingDateSeparator"> - </span><span class="productConfiguration__shippingDateEnd">Jul 30, 2021</span>","available":true,"availableSufficient":true,"notifyMe":false,"showOutOfStock":false,"similarBikes":false,"comingSoonByBackOrderAllocation":false},"hasSuccessorProduct":false,"comingSoonMessage":"Shipping <span class="productConfiguration__shippingDate">Jul 26, 2021</span><span class="productConfiguration__shippingDateSeparator"> - </span><span class="productConfiguration__shippingDateEnd">Jul 30, 2021</span>"}],"resetUrl":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=&pid=2947&quantity=1","hasSelectedValue":false,"isLastAttributeOnPDP":true,"colorAttribute":false,"sizeAttribute":true,"buttonAttribute":false,"damagedAttribute":false}]}};</script>' 

I need the elements after the span class =productConfiguration__shippingDateEnd i.e the "id" dictionary so that i can have something like this after the search.

{"id":"50015002","description":null,"displayValue":"2XL","value":"2XL","selected":false,"selectable":true,"url":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Variation?dwvar_2947_pv_rahmenfarbe=YE%2FBK&dwvar_2947_pv_rahmengroesse=2XL&pid=2947&quantity=1","hasComingSoon":false,"hasAllComingSoonAttr":false,"configurationUrl":"https://www.xzy.com/on/demandware.store/Sites-RoW-Site/en_DE/Product-Configure?pid=2947&dwvar_2947_pv_rahmengroesse=2XL&dwvar_2947_pv_rahmenfarbe=YE%2fBK","sizeMin":196,"sizeMax":999,"measurementInterval":"> 196 cm","comingSoonReason":"","comingSoon":false,"availability":{"messages":["Back order"],"inStockDate":"2021-07-26T00:00:00.000Z","onlyXLeftNumber":10,"onlyXLeft":false,"lowStock":false,"shippingInfo":"Shipping}' 

If i do soup1.find_all('span', class_ = 'productConfiguration__shippingDateEnd') i only get this result. Also .next_siblings doesnt return anything.

[<span class="productConfiguration__shippingDateEnd">Jul 30, 2021</span>,
[<span class="productConfiguration__shippingDateEnd">Jul 30, 2021</span>,

Any ideas how i can go about here. ?

Thanks a lot for your help.

Solution

What I see looks slightly different from as shown but contains stock info by size. You can use regex to extract the string, then json to handle turning the string into a json object.

import requests, re, json
from bs4 import BeautifulSoup as bs

r = requests.get('https://www.canyon.com/en-de/road-bikes/endurance-bikes/endurace/cf-sl/endurace-cf-sl-7-disc/2947.html?dwvar_2947_pv_rahmenfarbe=YE%2FBK')
s = re.search(r'window.deptsfra=(.*);', r.text).group(1)
#print(s)
data = json.loads(s)
print(data)

from pprint import pprint

pprint(data['productDetail']['variationAttributes'][1]['values'])

for i in data['productDetail']['variationAttributes'][1]['values']:
    print(i['value'], i['availability'])

Values as shown in the table as a dict:

results = {i['value']: (bs(i['availability']['shippingInfo']).get_text() if '<' in i['availability']['shippingInfo'] else i['availability']['shippingInfo']) for i in data['productDetail']['variationAttributes'][1]['values']}

enter image description here


Regex explanation:

enter image description here

Answered By: Anonymous

Related Articles

  • How to properly do JSON API GET requests and assign output…
  • How to parse JSON with XE2 dbxJSON
  • Azure Availability Zone ARM Config
  • The 'compilation' argument must be an instance of…
  • Event Snippet for Google only shows one event while testing…
  • Search match multiple values in single field in…
  • Avoid creating new session on each axios request laravel
  • loop and eliminate unwanted lines with beautiful soup
  • Why does this Azure Resource Manager Template fail…
  • NullpointerException error while working with choiceBox and…

Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.

Post navigation

Previous Post:

Remove Text from the edit text when edit text is focused

Next Post:

Deleting rows in pandas dataframe whose RGB value is black (0,0,0)

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Get code errors & solutions at akashmittal.com
© 2022 Fix Code Error