Best way to extract messy HTML tables using BeautifulSoup
Posted By: Anonymous
I am trying to extract a table from an HTML file. The table looks like this:
Form 990 FYE Date Published Overall Score Stars
CN 2.1
2019-06 12/23/2020 96.98
2017-06 05/01/2018 97.46
2016-06 06/01/2017 100.00
2015-06 07/01/2016 99.98
2015-06 06/01/2016 97.87
CN 2.0
2015-06 04/01/2016 95.22
2014-06 10/01/2015 94.56
2014-06 09/01/2015 86.22
2013-06 02/01/2014 95.01
2012-06 09/01/2013 95.24
2012-06 07/01/2013 88.04
2011-06 12/01/2012 99.13
2011-06 04/01/2012 92.17
2010-06 09/20/2011 92.17
The table HTML looks like this:
<table class="summaryPage ratings" width="100%">
<tr>
<th align="left" scope="col">Form 990 FYE</th>
<th align="left" scope="col">Date Published</th>
<th align="center" scope="col">Overall Score</th>
<th scope="col" style="text-align: center;">Overall Rating</th>
</tr>
<tr class="methodology-2-1 current">
<td colspan="10">
<b><a href="/index.cfm?bay=content.view&cpid=2200">CN 2.1</a></b>
</td>
</tr>
<tr class="current">
<td>
2019-06
</td>
<td>
12/23/2020
</td>
<td align="center">96.98</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-1">
<td>
2017-06
</td>
<td>
05/01/2018
</td>
<td align="center">97.46</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-1">
<td>
2016-06
</td>
<td>
06/01/2017
</td>
<td align="center">100.00</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-1">
<td>
2015-06
</td>
<td>
07/01/2016
</td>
<td align="center">99.98</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-1">
<td>
<span id="cf_tooltip_28842661508586">
2015-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
</span>
</td>
<td>
06/01/2016
</td>
<td align="center">97.87</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td colspan="10"></td>
</tr>
<tr class="">
<td colspan="10">
<b><a href="/index.cfm?bay=content.view&cpid=2200">CN 2.0</a></b>
</td>
</tr>
<tr class="">
<td>
2015-06
</td>
<td>
04/01/2016
</td>
<td align="center">95.22</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2014-06
</td>
<td>
10/01/2015
</td>
<td align="center">94.56</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
<span id="cf_tooltip_28842661508587">
2014-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
</span>
</td>
<td>
09/01/2015
</td>
<td align="center">86.22</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>three stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#fff" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459" stroke="#CDCCCC"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2013-06
</td>
<td>
02/01/2014
</td>
<td align="center">95.01</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2012-06
</td>
<td>
09/01/2013
</td>
<td align="center">95.24</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
<span id="cf_tooltip_28842661508588">
2012-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
</span>
</td>
<td>
07/01/2013
</td>
<td align="center">88.04</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>three stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#fff" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459" stroke="#CDCCCC"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2011-06
</td>
<td>
12/01/2012
</td>
<td align="center">99.13</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
<span id="cf_tooltip_28842661508589">
2011-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
</span>
</td>
<td>
04/01/2012
</td>
<td align="center">92.17</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
<tr class="methodology-2-0">
<td>
2010-06
</td>
<td>
09/20/2011
</td>
<td align="center">92.17</td>
<td align="center">
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
<title>four stars</title>
<g>
<g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
</g>
<polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
</g>
</svg>
</td>
</tr>
</table>
_x000D_
_x000D_
_x000D_
Notice that the table is simple but the HTML code is potentially a bit messy. The data for the column Stars
is found in the chunk of code svg class="stars"
, the rest is found in the chunks like tr class="methodology-2-0"
. I would like to extract the table to store it, and since I will do it for a few thousand files, I am wondering what is the best method to do so. My desired output would look like this:
Form 990 FYE Date Published Overall Score Stars CN
2019-06 12/23/2020 96.98 X stars CN 2.1
2017-06 05/01/2018 97.46 Y star CN 2.0
2016-06 06/01/2017 100.00 .... ......
I am wondering what is the best way to do so. The first approach I found here did not work when I adapted it:
sumtab= soup.find('table',class_='summaryPage ratings')
sumdf = pd.DataFrame(columns=['Form 990 FYE','Date Published','Overall Score','Overall Rating'])
for row in sumtab.find_all('tr'):
cols = row.find_all('td')
row_list = [ data.text for data in cols ]
temp_df = pd.DataFrame([row_list], columns = ['Form 990 FYE','Date Published','Overall Score','Overall Rating'])
sumdf = sumdf.append(temp_df).reset_index(drop = True)
sumdf = sumdf.iloc[1:, :]
The following attempt also does not work:
table = pd.read_html(soup.find(class_="summaryPage ratings"))
print(table)
Do you have any suggestions?
Solution
You could store CN
in a value when you encounter it while iterate the column rows, and keep adding the current CN
value to the column row lists:
from bs4 import BeautifulSoup
import pandas as pd
soup = BeautifulSoup(your_html)
lists = []
cn = None
for row in soup.find_all('tr'):
cols = row.find_all('td')
c = [i.text.strip() for i in cols]
if len(c) == 1:
cn = c[0]
elif len(c) > 1:
c = c + [cn]
lists.append(c)
df = pd.DataFrame(lists, columns = ['Form 990 FYE','Date Published','Overall Score','Stars', 'CN'])
Result:
Form 990 FYE | Date Published | Overall Score | Stars | CN | |
---|---|---|---|---|---|
0 | 2019-06 | 12/23/2020 | 96.98 | four stars | CN 2.1 |
1 | 2017-06 | 05/01/2018 | 97.46 | four stars | CN 2.1 |
2 | 2016-06 | 06/01/2017 | 100 | four stars | CN 2.1 |
3 | 2015-06 | 07/01/2016 | 99.98 | four stars | CN 2.1 |
4 | 2015-06 | 06/01/2016 | 97.87 | four stars | CN 2.1 |
Answered By: Anonymous
Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.