Skip to content
Fix Code Error

loop and eliminate unwanted lines with beautiful soup

July 13, 2021 by Code Error
Posted By: Anonymous

I have an html file of a city’ ways, from which I want to extract only those which are secondary, and its following lines (extract below):

<node id="8762697302" visible="true" version="1" changeset="105251293" timestamp="2021-05-24T21:31:46Z" user="4TL4S" uid="12781275" lat="19.5021226" lon="-99.1210088"/>
 <node id="8762697303" visible="true" version="1" changeset="105251293" timestamp="2021-05-24T21:31:46Z" user="4TL4S" uid="12781275" lat="19.5021537" lon="-99.1210855"/>
 <node id="8762697304" visible="true" version="1" changeset="105251293" timestamp="2021-05-24T21:31:46Z" user="4TL4S" uid="12781275" lat="19.5021738" lon="-99.1211046"/>
 <node id="8762697305" visible="true" version="1" changeset="105251293" timestamp="2021-05-24T21:31:46Z" user="4TL4S" uid="12781275" lat="19.5022129" lon="-99.1211099"/>
 <way id="24984236" visible="true" version="36" changeset="105251293" timestamp="2021-05-24T21:31:46Z" user="4TL4S" uid="12781275">
  <nd ref="271534238"/>
  <nd ref="271534237"/>
  <nd ref="301605624"/>
  <nd ref="8130722656"/>
  <nd ref="271534236"/>
  <nd ref="301605886"/>
  <nd ref="8490482530"/>
  <nd ref="271534235"/>
  <nd ref="8130722659"/>
  <nd ref="297808621"/>
  <nd ref="5120247163"/>
  <nd ref="8500986642"/>
  <nd ref="8112567831"/>
  <nd ref="8336910886"/>
  <nd ref="8336910883"/>
  <nd ref="8336910885"/>
  <nd ref="8112567832"/>
  <nd ref="8336910884"/>
  <nd ref="8336910887"/>
  <nd ref="271534230"/>
  <nd ref="8112567834"/>
  <nd ref="8762697298"/>
  <nd ref="8112567833"/>
  <nd ref="6348455382"/>
  <tag k="highway" v="secondary"/>
  <tag k="lanes" v="3"/>
  <tag k="name" v="Avenida Acueducto de Guadalupe"/>
  <tag k="oneway" v="yes"/>
  <tag k="surface" v="asphalt"/>
 </way>
 <way id="24984237" visible="true" version="50" changeset="100730322" timestamp="2021-03-09T19:35:26Z" user="TheShiningAlbatross" uid="11724618">
  <nd ref="1789642294"/>
  <nd ref="298263634"/>
  <nd ref="6348437061"/>
  <nd ref="297274089"/>
  <nd ref="8109075718"/>
  <nd ref="297387276"/>
  <nd ref="297274088"/>
  <nd ref="8089031454"/>
  <nd ref="271535272"/>
  <nd ref="297387125"/>
  <nd ref="271535273"/>
  <nd ref="271535274"/>
  <nd ref="8089403582"/>
  <nd ref="5272807864"/>
  <nd ref="271535275"/>
  <nd ref="5272807871"/>
  <nd ref="271535276"/>
  <nd ref="8500972920"/>
  <nd ref="8089235401"/>
  <nd ref="8089235393"/>
  <nd ref="297373675"/>
  <tag k="highway" v="secondary"/>
  <tag k="lanes" v="3"/>
  <tag k="name" v="Avenida Instituto PolitÊcnico Nacional"/>
  <tag k="oneway" v="yes"/>
  <tag k="surface" v="asphalt"/>
 </way>
 <way id="27093652" visible="true" version="5" changeset="100666370" timestamp="2021-03-09T00:06:55Z" user="TheShiningAlbatross" uid="11724618">
  <nd ref="297274089"/>
  <nd ref="8498394999"/>
  <nd ref="8498394998"/>
  <nd ref="297274090"/>
  <nd ref="298256487"/>
  <nd ref="299379524"/>
  <nd ref="297274091"/>
  <nd ref="297274088"/>
  <tag k="highway" v="service"/>
  <tag k="oneway" v="yes"/>
  <tag k="surface" v="asphalt"/>
 </way>
 <way id="27093653" visible="true" version="24" changeset="100661225" timestamp="2021-03-08T20:45:38Z" user="TheShiningAlbatross" uid="11724618">
  <nd ref="8089031455"/>
  <nd ref="8227092924"/>
  <nd ref="298270527"/>
  <nd ref="8227092918"/>
  <nd ref="297275667"/>
  <nd ref="1905088915"/>
  <nd ref="8089365647"/>
  <nd ref="8227089401"/>
  <nd ref="3779095087"/>
  <nd ref="3779095094"/>
  <nd ref="3779095086"/>
  <nd ref="3779095093"/>
  <nd ref="1792764124"/>
  <nd ref="1792764110"/>
  <nd ref="1792767134"/>
  <nd ref="6174887577"/>
  <nd ref="297274093"/>
  <nd ref="1792795130"/>
  <nd ref="8498057567"/>
  <nd ref="297274094"/>
  <nd ref="1792764140"/>
  <nd ref="8088692607"/>
  <nd ref="1792764135"/>
  <nd ref="8490529604"/>
  <nd ref="8490529603"/>
  <nd ref="297274095"/>
  <nd ref="1792764131"/>
  <nd ref="268538192"/>
  <tag k="highway" v="tertiary"/>
  <tag k="lanes" v="2"/>
  <tag k="name" v="Calzada TicomÃĄn"/>
  <tag k="oneway" v="no"/>
  <tag k="surface" v="asphalt"/>
 </way>
 <way id="27093807" visible="true" version="22" changeset="95860337" timestamp="2020-12-15T08:51:08Z" user="Utsunomiya" uid="10074594">
  <nd ref="8089031453"/>
  <nd ref="6360545982"/>
  <nd ref="297275687"/>
  <nd ref="298281142"/>
  <nd ref="298281139"/>
  <nd ref="299381506"/>
  <nd ref="6360545980"/>
  <nd ref="297275694"/>
  <nd ref="297275704"/>
  <nd ref="6360545969"/>
  <nd ref="297275707"/>
  <nd ref="299381507"/>
  <nd ref="1790748535"/>
  <nd ref="297275708"/>
  <nd ref="297275709"/>
  <nd ref="1792449299"/>
  <nd ref="1792449301"/>
  <nd ref="8104327358"/>
  <nd ref="8205206290"/>
  <nd ref="299382462"/>
  <nd ref="8205206222"/>
  <nd ref="8205206221"/>
  <nd ref="8230427925"/>
  <nd ref="8089031453"/>
  <tag k="addr:city" v="Ciudad de Mèxico"/>
  <tag k="amenity" v="university"/>
  <tag k="name" v="Centro de InvestigaciÃŗn y de Estudios Avanzados CINVESTAV"/>
  <tag k="operator" v="Instituto PolitÊcnico Nacional"/>
  <tag k="surface" v="asphalt"/>
 </way>
 <way id="27093966" visible="true" version="2" changeset="640886" timestamp="2008-09-15T18:30:34Z" user="yvasilev" uid="23179">
  <nd ref="297277371"/>
  <nd ref="297277373"/>
  <nd ref="297277375"/>
  <nd ref="297277377"/>
  <nd ref="297277371"/>
  <tag k="building" v="yes"/>
  <tag k="created_by" v="Merkaartor 0.11"/>
  <tag k="name" v="Patología Experimental y Fisiología"/>
 </way>
 <way id="27093967" visible="true" version="2" changeset="640886" timestamp="2008-09-15T18:30:35Z" user="yvasilev" uid="23179">
  <nd ref="297277385"/>
  <nd ref="297277388"/>
  <nd ref="297277390"/>
  <nd ref="297277392"/>
  <nd ref="297277385"/>
  <tag k="building" v="yes"/>
  <tag k="created_by" v="Merkaartor 0.11"/>
  <tag k="name" v="Fisiología"/>
 </way>
 <way id="27093969" visible="true" version="2" changeset="640886" timestamp="2008-09-15T18:30:36Z" user="yvasilev" uid="23179">
  <nd ref="297277396"/>
  <nd ref="297277398"/>
  <nd ref="297277400"/>
  <nd ref="297277405"/>
  <nd ref="297277396"/>
  <tag k="building" v="yes"/>
  <tag k="created_by" v="Merkaartor 0.11"/>
  <tag k="name" v="Bioquímica"/>
 </way>
 <way id="27093972" visible="true" version="2" changeset="640886" timestamp="2008-09-15T18:30:36Z" user="yvasilev" uid="23179">
  <nd ref="297277414"/>
  <nd ref="297277415"/>
  <nd ref="297277416"/>
  <nd ref="297277417"/>
  <nd ref="297277414"/>
  <tag k="building" v="yes"/>
  <tag k="created_by" v="Merkaartor 0.11"/>
  <tag k="name" v="GenÊtica, Patología 1a secciÃŗn"/>
 </way>

so, when i do:

soup.find('way')

i get (I know .find just gets first result):

<way changeset="105251293" id="24984236" timestamp="2021-05-24T21:31:46Z" uid="12781275" user="4TL4S" version="36" visible="true">
<nd ref="271534238"></nd>
<nd ref="271534237"></nd>
<nd ref="301605624"></nd>
<nd ref="8130722656"></nd>
<nd ref="271534236"></nd>
<nd ref="301605886"></nd>
<nd ref="8490482530"></nd>
<nd ref="271534235"></nd>
<nd ref="8130722659"></nd>
<nd ref="297808621"></nd>
<nd ref="5120247163"></nd>
<nd ref="8500986642"></nd>
<nd ref="8112567831"></nd>
<nd ref="8336910886"></nd>
<nd ref="8336910883"></nd>
<nd ref="8336910885"></nd>
<nd ref="8112567832"></nd>
<nd ref="8336910884"></nd>
<nd ref="8336910887"></nd>
<nd ref="271534230"></nd>
<nd ref="8112567834"></nd>
<nd ref="8762697298"></nd>
<nd ref="8112567833"></nd>
<nd ref="6348455382"></nd>
<tag k="highway" v="secondary"></tag>
<tag k="lanes" v="3"></tag>
<tag k="name" v="Avenida Acueducto de Guadalupe"></tag>
<tag k="oneway" v="yes"></tag>
<tag k="surface" v="asphalt"></tag>
</way>

From this results I would like to get rid of previous lines and only obtain its text, something like:

     k="highway" v="secondary"
     k="lanes" v="3"
     k="name" v="Avenida Acueducto de Guadalupe"
     k="oneway" v="yes"
     k="surface" v="asphalt"

this is a very large file so i need to loop throught it, to then turn it into a table to process it with pandas. I haven’t figured how to do this, please help

Solution

One example how to create pandas DataFrame from the HTML file (your_file.html contains HTML from the question):

import pandas as pd
from bs4 import BeautifulSoup

soup = BeautifulSoup(open("your_file.html", "r").read(), "html.parser")

data = []
for way in soup.select("way"):
    data.append({})
    for tag in way.select("tag"):
        data[-1][tag["k"]] = tag["v"]

df = pd.DataFrame(data).fillna("")
print(df)

Prints:

     highway lanes                                               name oneway  surface          addr:city     amenity                         operator building       created_by
0  secondary     3                     Avenida Acueducto de Guadalupe    yes  asphalt                                                                                          
1  secondary     3            Avenida Instituto PolitÊcnico Nacional    yes  asphalt                                                                                          
2    service                                                             yes  asphalt                                                                                          
3   tertiary     2                                   Calzada TicomÃĄn     no  asphalt                                                                                          
4                   Centro de InvestigaciÃŗn y de Estudios Avanzad...         asphalt  Ciudad de Mèxico  university  Instituto PolitÊcnico Nacional                          
5                               Patología Experimental y Fisiología                                                                                      yes  Merkaartor 0.11
6                                                         Fisiología                                                                                      yes  Merkaartor 0.11
7                                                         Bioquímica                                                                                      yes  Merkaartor 0.11
8                                   GenÊtica, Patología 1a secciÃŗn                                                                                      yes  Merkaartor 0.11
Answered By: Anonymous

Related Articles

  • How to properly do JSON API GET requests and assign output…
  • How to parse JSON with XE2 dbxJSON
  • Azure Availability Zone ARM Config
  • The 'compilation' argument must be an instance of…
  • Search match multiple values in single field in…
  • Event Snippet for Google only shows one event while testing…
  • Avoid creating new session on each axios request laravel
  • Why does this Azure Resource Manager Template fail…
  • NullpointerException error while working with choiceBox and…
  • mongodb group values by multiple fields

Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.

Post navigation

Previous Post:

How to send data which arrived on post endpoint through io.emit? NodeJS + React

Next Post:

Getting Layer Feature Information from GeoServer (Google Maps)

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Get code errors & solutions at akashmittal.com
© 2022 Fix Code Error