Skip to content
Fix Code Error

Best way to extract messy HTML tables using BeautifulSoup

June 20, 2021 by Code Error
Posted By: Anonymous

I am trying to extract a table from an HTML file. The table looks like this:

    Form 990 FYE    Date Published  Overall Score  Stars
CN 2.1
2019-06               12/23/2020    96.98   
2017-06               05/01/2018    97.46   
2016-06               06/01/2017    100.00  
2015-06               07/01/2016    99.98   
2015-06               06/01/2016    97.87   
 
CN 2.0
2015-06                04/01/2016   95.22   
2014-06               10/01/2015    94.56   
2014-06               09/01/2015    86.22   
2013-06               02/01/2014    95.01   
2012-06               09/01/2013    95.24   
2012-06               07/01/2013    88.04   
2011-06               12/01/2012    99.13   
2011-06               04/01/2012    92.17   
2010-06               09/20/2011    92.17

The table HTML looks like this:

_x000D_

_x000D_

<table class="summaryPage ratings" width="100%">
    <tr>
        <th align="left" scope="col">Form 990 FYE</th>
        <th align="left" scope="col">Date Published</th>
        <th align="center" scope="col">Overall Score</th>
        <th scope="col" style="text-align: center;">Overall Rating</th>
    </tr>
    <tr class="methodology-2-1 current">
        <td colspan="10">
            <b><a href="/index.cfm?bay=content.view&amp;cpid=2200">CN 2.1</a></b>
        </td>
    </tr>
    <tr class="current">
        <td>
            2019-06
        </td>
        <td>
            12/23/2020
        </td>
        <td align="center">96.98</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-1">
        <td>
            2017-06
        </td>
        <td>
            05/01/2018
        </td>
        <td align="center">97.46</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-1">
        <td>
            2016-06
        </td>
        <td>
            06/01/2017
        </td>
        <td align="center">100.00</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-1">
        <td>
            2015-06
        </td>
        <td>
            07/01/2016
        </td>
        <td align="center">99.98</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-1">
        <td>
            <span id="cf_tooltip_28842661508586">
                2015-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
            </span>
        </td>
        <td>
            06/01/2016
        </td>
        <td align="center">97.87</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-0">
        <td colspan="10"></td>
    </tr>
    <tr class="">
        <td colspan="10">
            <b><a href="/index.cfm?bay=content.view&amp;cpid=2200">CN 2.0</a></b>
        </td>
    </tr>
    <tr class="">
        <td>
            2015-06
        </td>
        <td>
            04/01/2016
        </td>
        <td align="center">95.22</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-0">
        <td>
            2014-06
        </td>
        <td>
            10/01/2015
        </td>
        <td align="center">94.56</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-0">
        <td>
            <span id="cf_tooltip_28842661508587">
                2014-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
            </span>
        </td>
        <td>
            09/01/2015
        </td>
        <td align="center">86.22</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>three stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#fff" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459" stroke="#CDCCCC"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-0">
        <td>
            2013-06
        </td>
        <td>
            02/01/2014
        </td>
        <td align="center">95.01</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-0">
        <td>
            2012-06
        </td>
        <td>
            09/01/2013
        </td>
        <td align="center">95.24</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-0">
        <td>
            <span id="cf_tooltip_28842661508588">
                2012-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
            </span>
        </td>
        <td>
            07/01/2013
        </td>
        <td align="center">88.04</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>three stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#fff" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459" stroke="#CDCCCC"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-0">
        <td>
            2011-06
        </td>
        <td>
            12/01/2012
        </td>
        <td align="center">99.13</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-0">
        <td>
            <span id="cf_tooltip_28842661508589">
                2011-06 <span style="color: grey;"><i aria-hidden="true" class="fa fa-info-circle"></i></span>
            </span>
        </td>
        <td>
            04/01/2012
        </td>
        <td align="center">92.17</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
    <tr class="methodology-2-0">
        <td>
            2010-06
        </td>
        <td>
            09/20/2011
        </td>
        <td align="center">92.17</td>
        <td align="center">
            <?xml version="1.0" encoding="utf-8"?>
            <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
            <svg class="stars" enable-background="new 0 0 61 15" version="1.1" viewbox="0 0 61 15" x="0px" xml:space="preserve" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" y="0px">
                <title>four stars</title>
                <g>
                    <g>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="12.14,15 10.37,9.27 15,5.72 9.27,5.73 7.5,0 5.729,5.73 0,5.73 4.64,9.27 2.87,15 7.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="27.14,15 25.37,9.27 30,5.72 24.27,5.73 22.5,0 20.729,5.73 15,5.73 19.64,9.27 17.87,15 22.5,11.459"></polygon>
                        <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="58.141,15 56.369,9.27 61,5.72 55.27,5.73 53.5,0 51.73,5.73 46,5.73 50.641,9.27 48.869,15 53.5,11.459"></polygon>
                    </g>
                    <polygon clip-rule="evenodd" fill="#3499CD" fill-rule="evenodd" points="42.141,15 40.369,9.27 45,5.72 39.27,5.73 37.5,0 35.73,5.73 30,5.73 34.641,9.27 32.869,15 37.5,11.459"></polygon>
                </g>
            </svg>
        </td>
    </tr>
</table>

_x000D_

_x000D_

_x000D_

Notice that the table is simple but the HTML code is potentially a bit messy. The data for the column Stars is found in the chunk of code svg class="stars", the rest is found in the chunks like tr class="methodology-2-0". I would like to extract the table to store it, and since I will do it for a few thousand files, I am wondering what is the best method to do so. My desired output would look like this:

    Form 990 FYE    Date Published  Overall Score  Stars     CN

2019-06               12/23/2020    96.98          X stars  CN 2.1
2017-06               05/01/2018    97.46          Y star   CN 2.0
2016-06               06/01/2017    100.00         ....     ......

I am wondering what is the best way to do so. The first approach I found here did not work when I adapted it:

sumtab= soup.find('table',class_='summaryPage ratings')
sumdf = pd.DataFrame(columns=['Form 990 FYE','Date Published','Overall Score','Overall Rating'])

for row in sumtab.find_all('tr'):
    cols = row.find_all('td')
    row_list = [ data.text for data in cols ]
    temp_df = pd.DataFrame([row_list], columns = ['Form 990 FYE','Date Published','Overall Score','Overall Rating'])
    sumdf = sumdf.append(temp_df).reset_index(drop = True)


sumdf = sumdf.iloc[1:, :] 

The following attempt also does not work:

table = pd.read_html(soup.find(class_="summaryPage ratings"))
print(table)

Do you have any suggestions?

Solution

You could store CN in a value when you encounter it while iterate the column rows, and keep adding the current CN value to the column row lists:

from bs4 import BeautifulSoup
import pandas as pd

soup = BeautifulSoup(your_html)

lists = []
cn = None

for row in soup.find_all('tr'):
    cols = row.find_all('td')
    c = [i.text.strip() for i in cols]

    if len(c) == 1:
        cn = c[0]
    elif len(c) > 1:
        c = c + [cn]
        lists.append(c)
        
df = pd.DataFrame(lists, columns = ['Form 990 FYE','Date Published','Overall Score','Stars', 'CN']) 

Result:

Form 990 FYE Date Published Overall Score Stars CN
0 2019-06 12/23/2020 96.98 four stars CN 2.1
1 2017-06 05/01/2018 97.46 four stars CN 2.1
2 2016-06 06/01/2017 100 four stars CN 2.1
3 2015-06 07/01/2016 99.98 four stars CN 2.1
4 2015-06 06/01/2016 97.87 four stars CN 2.1
Answered By: Anonymous

Related Articles

  • How do I include certain conditions in SQL Count
  • VueJS masonry layout
  • DataTable draw daterange from vaadin-date-picker in polymer
  • Pandas pivot_table: filter on aggregate function
  • Dodged bar plot in R based on to columns with count Year…
  • How can I pass a wct test while rearranging children spans…
  • render function or template not defined in component:…
  • Trouble with Next js + Express deployment using Zeit Now
  • How would I be able to multiple select and pass data in the…
  • How to prevent scrolling the whole page?

Disclaimer: This content is shared under creative common license cc-by-sa 3.0. It is generated from StackExchange Website Network.

Post navigation

Previous Post:

How can i detect if a point is in a polygon with Boost Within

Next Post:

go-staticcheck: should use a simple channel send/receive instead of select with a single case (S1000)

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • Get code errors & solutions at akashmittal.com
© 2022 Fix Code Error