How do I import only one element when the element I want to import from Html is duplicated?

Asked 1 weeks ago, Updated 1 weeks ago, 1 views

I'm asking you a question because I can't solve it even if I google it and read the official document. The problem is as below.

After parsing, I want to output only one link address corresponding to href from the output result below.

// Code statement.

site = requests.get("") 

alba = BeautifulSoup(site.text, 'html.parser') 

brands = list(alba.find(id = "MainSuperBrand").find('ul', {"class" : "goodsBox"}). find_all('a', {"class" : "goodsBox-info"}))

for b in brands : 
    if "http" in b : 
   `b ='a.href') 

Attempt to extract the href element of the first tag from the parsed output statement.

<li class="first impact"><div class="B_MyAd_"></div> 
<a class="goodsBox-info" href="">*
 <span class="logo"> <imgalt="(Note)"src="//
20200916174910805.gif"/> </span> <span class="company"> Barogo</span> <span class="title">"<span> Barogo Recruitment <National Riders</span>> < < < < < <<<<<<<<<<<<&n></span> </span> </a>
<a class="brandHover" href=""  </a></li>, . ,,,,,,.

List statement.  ] 

li There are two hrefs in the tag <a> below the class, and in this case, how can only one be output? I wonder if you can.

html python java scraping

2022-09-20 12:31

4 Answers

Check the format of the return.

a = soup.find_all('a')
Add content

You can't? Is it really not working? Aren't you doing it the wrong way, not the way I explained earlier?

Aaa is returned to the list, but how did the result of the result set come out?

a = requests.get("")

aa = BeautifulSoup(a.text, 'html.parser')

aaa = list(aa.find(id = "MainSuperBrand").find('ul', {"class" : "goodsBox"}).find_all('a', {"class" : "goodsBox-info"}))

for aaaa in aaa :

2022-09-20 12:31

If it's Python Beautiful Soup... There's also a function called find Look it up

2022-09-20 12:31

The code is as follows:

site = requests.get("")

alba = BeautifulSoup(site.text, 'html.parser')

brands = list(alba.find(id = "MainSuperBrand").find('ul', {"class" : "goodsBox"}).find_all('a', {"class" : "goodsBox-info"}))

for b in brands :
  if "http" in b : 
    b ='a.href')

Parsed html to be extracted.

<a class="goodsBox-info" href=""> 
<span class="logo"> <imgalt="Three Great Pigs' Feet" src="//"/> </span> 
<span class="company">Three major pigs' feet</span>
 <span class="title">
<span>Recruitment of employees and part-timers nationwide</span></span> 
<span class="wrap"> 
<span class="local">National</span> 
<span class="pay"><span class="pay Letter">Check by announcement</span>
 <span class="payIcon talk"></span></span> </span> </a>

2022-09-20 12:31

I also tried it with the revised content, The link address is printed normally Like the image, the tag text was printed in duplicate.

So I posted a question because I thought I should do something more within the tag so that I don't get duplicate content.
I don't know why the tag is duplicated on the link address.

2022-09-20 12:31

If you have any answers or tips

© 2022 pinfo. All rights reserved.