While performing web crawling for data analysis, html is called up, but it is not called by the .select('table') method that I always used.

Asked 2 weeks ago, Updated 2 weeks ago, 1 views

### Calling Pandas
### Include in variable called page url to crawl web

import pandas as pd 

url = 'https://api.xangle.io/project/exchange/price?lang=ko&currency=krw&page=0&items_per_page=50'

### Calling requests function 
import requests

### url check if there is no problem
print(url)

### Specify heads
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36'}

### Verify that heads type is dict type
type(headers)

### Request in the get form of the requests function
response = requests.get(url, headers=headers)

### Return the requested to text
response.text

### Apply Beautiful Soup & lxml to load
from bs4 import BeautifulSoup as bs
html = bs(response.text, 'lxml')
html

### Load html in table units

### Here, the problem HTML is called up, but if you look at len(table), it says empty.
### I don't think it's a table unit, but I don't know how to solve it, so I post a question.

table = html.select('table')
len(table)

Question: Here, the problem HTML is called up, but if you look at len(table), it comes out as 0. I don't think it's a table unit, but I don't know what to do, so I'm asking you Masters, please~

url : https://ko.xangle.io/project/list

beautifulsoup requests html table python

2022-09-20 15:39

1 Answers

The loaded HTML may not have a table. If you go into the url and look at the source, it comes out like this.

<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta http-equiv=X-UA-Compatible content="IE=edge"><meta name=robots content="index, follow"><meta name=viewport content="width=device-width,initial-scale=1,maximum-scale=5"><meta name=msapplication-TileColor content=#ffffff><meta name=theme-color content=#ffffff><meta property=fb:app_id content=2227843314148847><script>(function(w,d,s,l,i){w[l]=w[l]||[];w[l].push({'gtm.start':
        new Date().getTime(),event:'gtm.js'});var f=d.getElementsByTagName(s)[0],
      j=d.createElement(s),dl=l!='dataLayer'?'&l='+l:'';j.async=true;j.src=
      'https://www.googletagmanager.com/gtm.js?id='+i+dl;f.parentNode.insertBefore(j,f);
    })(window,document,'script','dataLayer','GTM-N83PQPN');</script><meta name=naver-site-verification content=b85c05f9817a2feb17a00ce5e3d65d3819e10a3f><link rel=icon href=/favicon.ico><link rel=apple-touch-icon sizes=180x180 href=/apple-touch-icon.png><link rel=icon type=image/png sizes=32x32 href=/favicon-32x32.png><link rel=icon type=image/png sizes=16x16 href=/favicon-16x16.png><link rel=manifest href=/site.webmanifest><link rel=mask-icon href=/safari-pinned-tab.svg color=#4219ad><script src=https://developers.kakao.com/sdk/js/kakao.min.js></script><script type=application/ld+json>{
            "@context": "http://schema.org",
            "@type": "Person",
            "name": "Xangle",
            "url": "https://xangle.io",
            "sameAs": [
                "https://twitter.com/CA_disclosure"
            ]
        }</script><script>var protect_id = 'c491';</script><script async src=//script.boraware.kr/protect_script_ads.js></script><link href=/css/app.c407945d.css rel=preload as=style><link href=/css/node_modules.4d973939.css rel=preload as=style><link href=/css/vuetify.fd8e09c6.css rel=preload as=style><link href=/js/app.7ee34fb3.js rel=preload as=script><link href=/js/node_modules.7235d617.js rel=preload as=script><link href=/js/vuetify.44ff6af3.js rel=preload as=script><link href=/css/vuetify.fd8e09c6.css rel=stylesheet><link href=/css/node_modules.4d973939.css rel=stylesheet><link href=/css/app.c407945d.css rel=stylesheet></head><body class=hide-overlay><noscript><iframe src="https://www.googletagmanager.com/ns.html?id=GTM-N83PQPN" height=0 width=0 style=display:none;visibility:hidden></iframe></noscript><div id=app></div><noscript><strong>We're sorry but crossangle doesn't work properly without JavaScript enabled. Please enable it to continue.</strong></noscript><script defer src="https://s3.ap-northeast-2.amazonaws.com/service.xangle.io/xi-widget.min.js?xv=8"></script><script async src="https://www.googletagmanager.com/gtag/js?id=UA-132974252-3"></script><script>window.dataLayer = window.dataLayer || [];
  function gtag(){dataLayer.push(arguments);}
  gtag('js', new Date());
  gtag('config', 'UA-132974252-3');</script><script src=//wcs.naver.net/wcslog.js></script><script>if (!wcs_add) var wcs_add={};
  wcs_add["wa"] = "s_15f9acc4cd62";
  if (!_nasa) var _nasa={};
  if(window.wcs){
    wcs.inflow("xangle.io");
    wcs_do(_nasa);
  }</script><script src=/js/vuetify.44ff6af3.js></script><script src=/js/node_modules.7235d617.js></script><script src=/js/app.7ee34fb3.js></script></body></html>

As you can see, there is no table tag. Maybe it's just that.


2022-09-20 15:39

If you have any answers or tips


© 2022 pinfo. All rights reserved.