Welcome, guest! Please login or register.

    * Shoutbox

    RefreshHistory
    • ASDss: where do u download source and clients now
      August 18, 2017, 10:39:31 PM
    • ASDss: yo
      August 18, 2017, 10:39:20 PM
    • dan v jad: click me 4 da fun ;)[link]
      August 18, 2017, 04:39:58 PM
    • stCky: Palidinho is your OpenGL (was it OpenGL?) stuff open source anywhere?
      August 16, 2017, 09:07:22 PM
    • Travas:BUILD THE WALL
      August 15, 2017, 09:28:49 PM
    • Travas: i have ass cancer
      August 15, 2017, 09:23:29 PM
    • stCky: what are the fudge are you tryna ask?
      August 15, 2017, 08:21:35 PM
    • bader: what are the rsps community alive ?
      August 15, 2017, 05:46:16 PM
    • bader: yo guys
      August 15, 2017, 05:46:08 PM
    • Spacehost:[link] Updated our thread :)
      August 15, 2017, 09:40:34 AM
    • Adaro: The client is in Download section at Homepage
      August 15, 2017, 01:09:20 AM
    • FaTe_Of_GoDs: where do i get the client?????????????
      August 14, 2017, 05:23:14 PM
    • stCky: can anyone help me? I cant login to the shoutbox
      August 13, 2017, 05:45:15 PM
    • drubrkletern: appeal denied
      August 13, 2017, 02:35:27 PM
    • King_Trout:[link]
      August 13, 2017, 11:17:12 AM
    • Cole1497: no sorry
      August 13, 2017, 10:27:14 AM
    • ayz: yo can anyone explain something to me
      August 13, 2017, 08:08:51 AM
    • coolking12: Hi
      August 13, 2017, 04:16:06 AM
    • stCky: n+1
      August 11, 2017, 06:09:24 PM
    • PalidinoDH: How many more pages are going to show errors before this dude gets on and fixes shit
      August 11, 2017, 04:57:00 PM

    Author Topic: Scraping group title from Facebook  (Read 731 times)

    0 Members and 1 Guest are viewing this topic.

    OfflineAmbokile

    • Member
    • ****
    • Posts: 3,009
    • Thanks: +0/-0
      • View Profile
    Scraping group title from Facebook
    « on: April 27, 2015, 10:47:32 AM »
    I'm trying to get some information from this page here:
    https://www.facebook.com/unsupportedbrowser=ts

    When you click "See all", the members list is displayed. I'm trying to get the following text: "Members of UCLA class of 2018 Official Group".

    However, I get the following error when I try to print the text from the XPath:

    Code: [Select]
    Traceback (most recent call last):
        File "scraper.py", line 35, in <module>
            print title.text()
    AttributeError: 'NoneType' object has no attribute 'text'

    When I dump the session as an image, the title is visible, so I know that the text is available for scraping.

    Here is my full code:

    Code: Python
    1.     importtime
    2.     import dryscrape
    3.     import json
    4.     import ast
    5.    
    6.     username ='USERNAME'
    7.     password ='PASSWORD'
    8.    
    9.     # set up a web scraping session
    10.     sess = dryscrape.Session(base_url ='https://www.facebook.com/unsupportedbrowser#039;)
    11.    
    12.    # visit homepage and log in
    13.    print "Logging in..."
    14.    sess.visit('/login.php?next=https%3A%2F%2Fwww.facebook.com%2Fgroups%2F574916095895510%2F%3Ffref%3Dts')
    15.    
    16.    # Set username and password
    17.    username_field = sess.at_css('#email')
    18.     password_field = sess.at_css('#pass')
    19.     username_field.set(username)
    20.     password_field.set(password)
    21.    
    22.     # Submit the form
    23.     username_field.form().submit()
    24.    
    25.     # Wait
    26.     time.sleep(3)
    27.    
    28.     print"Viewing all members..."
    29.     see_all_button = sess.at_xpath('//*[@id="pagelet_group_profile_members"]/div/div/div/div[1]/div/a')
    30.     see_all_button.click()
    31.    
    32.     time.sleep(3)
    33.    
    34.     title = sess.at_xpath('//*[@id="u_z_0"]/div/div[1]/h3')
    35.     print title.text()
    36.    
    37.     sess.render('fb.png')

    Here is the image, as you can see the title is visible:


    OfflineHcoJustin

    • Member
    • ****
    • Posts: 2,400
    • Thanks: +0/-0
      • View Profile
    Re: Scraping group title from Facebook
    « Reply #1 on: April 27, 2015, 11:58:03 AM »
    The id of the element you are trying to select changes every time you view the page. You'll either have to pull your data from the title in the head, or use the Graph API. https://www.facebook.com/unsupportedbrowser

    If you use the API you'll get a response similar to
    Code: [Select]
    {
      "description": "Welcome UCLA Class of 2018! This group here to serve as a resource for prospective and accepted students to network and have fun!\n\nEnjoy the group and feel free to invite other friends attending UCLA! \n\nPlease be aware of the following:\nThis group is not affiliated with UCLA, nor is it in any way sponsored or endorsed or created on authority of a university department or administrative unit. This group is merely a meeting space for future students. The group title states \"Official\" This is not meant to imply an affiliation of this group to the university.  ",
      "email": "[email protected]",
      "icon": "https://www.facebook.com/unsupportedbrowser;,
      "name": "UCLA class of 2018 Official Group",
      "owner": {
        "id": "693624504070171",
        "name": "Dave Lichtenberger"
      },
      "privacy": "CLOSED",
      "updated_time": "2015-04-27T06:47:36+0000",
      "id": "5749160958955"
    }

    « Last Edit: April 27, 2015, 12:02:03 PM by HcoJustin »

    Offlinejustaguy

    • Member
    • ****
    • *
    • Posts: 707
    • Thanks: +0/-0
      • View Profile
    Re: Scraping group title from Facebook
    « Reply #2 on: April 27, 2015, 12:11:48 PM »
    Your XPath query is giving me something different than what is displayed by firebug. It appears the id you supplied changes because I get u_g_0 rather than u_z_0 (the 0 at the end might change too since according to firebug there are several divs with things like u_r_2c, etc.). Since Facebook has a very dynamic structure, you're going to have to be as specific as possible. Try:
    Code: [Select]
    //div[starts-with(@id, "u_")]/div[@class="clearfix"]/div/h3
    As an aside: the Facebook API as HcoJustin pointed out is a better idea.
    RIP

    OfflineAmbokile

    • Member
    • ****
    • Posts: 3,009
    • Thanks: +0/-0
      • View Profile
    Re: Scraping group title from Facebook
    « Reply #3 on: April 27, 2015, 12:12:50 PM »
    The id of the element you are trying to select changes every time you view the page. You'll either have to pull your data from the title in the head, or use the Graph API. https://www.facebook.com/unsupportedbrowser

    If you use the API you'll get a response similar to
    Code: [Select]
    {
      "description": "Welcome UCLA Class of 2018! This group here to serve as a resource for prospective and accepted students to network and have fun!\n\nEnjoy the group and feel free to invite other friends attending UCLA! \n\nPlease be aware of the following:\nThis group is not affiliated with UCLA, nor is it in any way sponsored or endorsed or created on authority of a university department or administrative unit. This group is merely a meeting space for future students. The group title states \"Official\" This is not meant to imply an affiliation of this group to the university.  ",
      "email": "[email protected]",
      "icon": "https://www.facebook.com/unsupportedbrowser;,
      "name": "UCLA class of 2018 Official Group",
      "owner": {
        "id": "693624504070171",
        "name": "Dave Lichtenberger"
      },
      "privacy": "CLOSED",
      "updated_time": "2015-04-27T06:47:36+0000",
      "id": "5749160958955"
    }

    My goal with the project is to get a list of all members of the group. Is that possible with the Graph API?

    Offlinejustaguy

    • Member
    • ****
    • *
    • Posts: 707
    • Thanks: +0/-0
      • View Profile
    Re: Scraping group title from Facebook
    « Reply #4 on: April 27, 2015, 12:14:16 PM »
    The id of the element you are trying to select changes every time you view the page. You'll either have to pull your data from the title in the head, or use the Graph API. https://www.facebook.com/unsupportedbrowser

    If you use the API you'll get a response similar to
    Code: [Select]
    {
      "description": "Welcome UCLA Class of 2018! This group here to serve as a resource for prospective and accepted students to network and have fun!\n\nEnjoy the group and feel free to invite other friends attending UCLA! \n\nPlease be aware of the following:\nThis group is not affiliated with UCLA, nor is it in any way sponsored or endorsed or created on authority of a university department or administrative unit. This group is merely a meeting space for future students. The group title states \"Official\" This is not meant to imply an affiliation of this group to the university.  ",
      "email": "[email protected]",
      "icon": "https://www.facebook.com/unsupportedbrowser;,
      "name": "UCLA class of 2018 Official Group",
      "owner": {
        "id": "693624504070171",
        "name": "Dave Lichtenberger"
      },
      "privacy": "CLOSED",
      "updated_time": "2015-04-27T06:47:36+0000",
      "id": "5749160958955"
    }

    My goal with the project is to get a list of all members of the group. Is that possible with the Graph API?

    https://stackoverflow.com/questions/4087227/list-the-members-of-a-facebook-group-via-api
    RIP

    OfflineAmbokile

    • Member
    • ****
    • Posts: 3,009
    • Thanks: +0/-0
      • View Profile
    Re: Scraping group title from Facebook
    « Reply #5 on: April 27, 2015, 12:49:48 PM »
    I've created an access token but when I visit this link:
    _token=xxx

    I get the following response:
    Code: [Select]
    {
       "data": [
         
       ]
    }

    OfflineHcoJustin

    • Member
    • ****
    • Posts: 2,400
    • Thanks: +0/-0
      • View Profile
    Re: Scraping group title from Facebook
    « Reply #6 on: April 28, 2015, 01:27:06 AM »
    I've created an access token but when I visit this link:
    _token=xxx

    I get the following response:
    Code: [Select]
    {
       "data": [
         
       ]
    }

    If the group is private, you won't be able to get access to the member list until you become a member yourself. If you can't/don't want to join the group, you'll probably have to resort to scraping. As justaguy pointed out, the page structure is very dynamic. It is going to be difficult to come up with a query that will find what you are looking for.

    As stated by the docs: https://www.facebook.com/unsupportedbrowser
    Quote
    A user access token, for a member of the group, with user_groups permission is required.

     

    Copyright © 2017 MoparScape. All rights reserved.
    Powered by SMFPacks SEO Pro Mod |
    SimplePortal 2.3.5 © 2008-2012, SimplePortal