2010-01-09 5 views
1

Une question Curl (et oui, je l'ai cherché Google, ce site, ainsi que les docs !!)Curl/Fetch question - Referer, les données post, Cookies

un peu nouveau pour se courber. Essayer d'utiliser Curl à partir de la ligne de commande (linux) pour récupérer des pages d'un site Web d'université (sjsu). Je peux un peu obtenir la 1ère page, mais la 2ème/2ème pages utilise des referers/cookies .. Je pense que c'est là que les choses sont foutues.

Je traite également de l'extraction de données depuis des cadres sur la page donnée . Si j'analyse la page, en combinaison avec les données livehttpheader, cela ne devrait pas poser de problème.

J'ai essayé d'utiliser l'emporte-pot, ainsi que biscuits, dans diverses combinaisons avec le « -e » pour le referer, et le « -d » pour les données enregistrées ...

Je m affichant les données du processus de livehttpheader pour les pages, ainsi que le script shell test j'utilise .. les pensées/pointeurs serait grandement apprécié ...

Merci

-Bruce

** ce sont les données qui sont affichées, quand je sélectionne le "soumettre" btn sur l'emplacement/page terme pour se rendre à la « classe select » page » que je comprends le processus, la fonction Curl devrait plus ou moins reproduire ce processus, avec le referer, le postdata, et la cible http/url

données livehttpheaders:

https://cmshr.sjsu.edu/psc/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL 



POST /psc/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL HTTP/1.1 

Host: cmshr.sjsu.edu 

User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009061118  Fedora/3.0.11-1.fc9 Firefox/3.0.11 

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 

Accept-Language: en-us,en;q=0.5 

Accept-Encoding: gzip,deflate 

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 

Keep-Alive: 300 

Connection: keep-alive 

Referer: https://cmshr.sjsu.edu/psc/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?FolderPath=PORTAL_ROOT_OBJECT.PA_HC_CLASS_SEARCH&PortalActualURL=https%3a%2f%2fcmshr.sjsu.edu%2fpsc%2fHSJPRDF%2fEMPLOYEE%2fHSJPRD%2fc%2fCOMMUNITY_ACCESS.CLASS_SEARCH.GBL&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fcmshr.sjsu.edu%2fpsp%2fHSJPRDF%2f&PortalURI=https%3a%2f%2fcmshr.sjsu.edu%2fpsc%2fHSJPRDF%2f&PortalHostNode=HRMS&NoCrumbs=yes 

Cookie: [email protected]*[email protected]*[email protected]*[email protected]*[email protected]*[email protected]*[email protected]*[email protected]*[email protected]*[email protected]*; cssln164HSJ1-84915=B1h9LG2L1yydzpj3TvqQrHBHKdbWpnl8!695030755; ExpirePage=https://cmshr.sjsu.edu/psp/HSJPRDF/; PS_LOGINLIST=https://cmshr.sjsu.edu/HSJPRDF; PS_TOKENEXPIRE=9_Jan_2010_18:06:54_GMT; SignOnDefault=CMSPUBLIC; cssln169HSJ1-84915=JpgTLL9RJVQPWjphl3Vybx1xnvW8zbMl!-1963651055; cssln162HSJ1-84915=1LrQLH7b20GmB0kx50cf54GPGjj97WWM!813942549; cssln118HSJ1-84915=xbDyLLFZZCTbpLFhJQpCNwNF2ppgjH4y!-923260117; PS_TOKEN=AAAApgECAwQAAQAAAAACvAAAAAAAAAAsAARTaGRyAgBOcQgAOAAuADEAMBQLu6wR/++AI1n7RObpon/kagLJUAAAAGYABVNkYXRhWnicHclNDkBADIbhd4ZYWbkHYYKMrSF+gghxEPdzOJ3plz5tWuBVOopRSOnPm+HYuTl56NlYcAkjB1PKLPdVPhdDjaGkks7D9FqxoqEQjdiEvQtpJZYfnXoL+Q== 

Content-Type: application/x-www-form-urlencoded 

Content-Length: 339 

ICType=Panel&ICElementNum=0&ICStateNum=2&ICAction=CLASS_SRCH_WRK2_SSR_PB_SRCH%2457%24&ICXPos=0&ICYPos=0&ICFocus=&ICSaveWarningFilter=0&ICChanged=-1&ICResubmit=0&ICSID=HpLTZLhQFp4p&CLASS_SRCH_WRK2_INSTITUTION%2445%24=SJ000&CLASS_SRCH_STRM1=2102&CLASS_SRCH_WRK2_SSR_CLS_SRCH_TYPE%2458%24=06&CLASS_SRCH_WRK2_SSR_CLS_SRCH_TYPE%2458%24%24rad=06 

HTTP/1.x 200 OK 

Cache-Control: no-cache 

Date: Sat, 09 Jan 2010 18:07:47 GMT 

Content-Length: 21137 

Content-Type: text/html; CHARSET=UTF-8 

IgnorePortalRegisteredURL: 1 

PortalRegisteredURL: https://cmshr.sjsu.edu/psc/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL 

UsesPortalRelativeURL: true 

X-Powered-By: Servlet/2.4 JSP/2.0 

Set-Cookie: PS_TOKENEXPIRE=9_Jan_2010_18:07:47_GMT; domain=.sjsu.edu; path=/; secure 

test Curl shell script: 
============================================= 
#!/bin/sh -v 
# 
# test shell for curl.. 
# 
#curl --cookie lcookie.lwp --cookie-jar lcookie.lwp --output "ctest.dat" -L "http://my.sjsu.edu/" 

#foo="http://my.sjsu.edu/" 

#curl --cookie lcookie.lwp --cookie-jar lcookie.lwp --output "ctest.dat" -L "$foo" 

#exi 
curl -v --cookie-jar lcookie.lwp --output "ctest2.dat" https://cmshr.sjsu.edu/psp/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?FolderPath=PORTAL_ROOT_OBJECT.PA_HC_CLASS_SEARCH 
#exit 
curl -v -A "Mozilla/4.73 [en] (X11; U; Linux 2.2.15 i686)" --cookie-jar lcookie.lwp --output "ctest3.dat" -e "https://cmshr.sjsu.edu/psp/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?FolderPath=PORTAL_ROOT_OBJECT.PA_HC_CLASS_SEARCH" -L "https://cmshr.sjsu.edu/psc/HSJPRDF/EMPLOYEE/HRMS/s/WEBLIB_PT_NAV.ISCRIPT1.FieldFormula.IScript_UniHeader_Frame?c=uA%2buCaKuiBh5DTZEFHMBvNKbD7XLjINl&FolderPath=PORTAL_ROOT_OBJECT.PA_HC_CLASS_SEARCH&PortalActualURL=https%3a%2f%2fcmshr.sjsu.edu%2fpsc%2fHSJPRDF%2fEMPLOYEE%2fHSJPRD%2fc%2fCOMMUNITY_ACCESS.CLASS_SEARCH.GBL&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fcmshr.sjsu.edu%2fpsp%2fHSJPRDF%2f&PortalURI=https%3a%2f%2fcmshr.sjsu.edu%2fpsc%2fHSJPRDF%2f&PortalHostNode=HRMS&PortalIsPagelet=true&NoCrumbs=yes" 

#get the page with the term/location --- uses the get 
curl -v -A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11" --cookie-jar lcookie.lwp --output "ctest4.dat" -e "https://cmshr.sjsu.edu/psp/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?FolderPath=PORTAL_ROOT_OBJECT.PA_HC_CLASS_SEARCH" -L "https://cmshr.sjsu.edu/psc/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?FolderPath=PORTAL_ROOT_OBJECT.PA_HC_CLASS_SEARCH&PortalActualURL=https%3a%2f%2fcmshr.sjsu.edu%2fpsc%2fHSJPRDF%2fEMPLOYEE%2fHSJPRD%2fc%2fCOMMUNITY_ACCESS.CLASS_SEARCH.GBL&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fcmshr.sjsu.edu%2fpsp%2fHSJPRDF%2f&PortalURI=https%3a%2f%2fcmshr.sjsu.edu%2fpsc%2fHSJPRDF%2f&PortalHostNode=HRMS&NoCrumbs=yes" 

# 
# the following two lines are attempts to get the page with the class display... neither one works. 
# -instead, the output files are simply the same as the above page, with the location/term menu.. 
# 
# the page should be a pae that lists a class schedule select menu.. 
# 

#get the page with the search class menu... it's a post 
curl -v -A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11" --cookie-jar lcookie.lwp --output "ctest5.dat" -e "Referer: https://cmshr.sjsu.edu/psc/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL?FolderPath=PORTAL_ROOT_OBJECT.PA_HC_CLASS_SEARCH&PortalActualURL=https%3a%2f%2fcmshr.sjsu.edu%2fpsc%2fHSJPRDF%2fEMPLOYEE%2fHSJPRD%2fc%2fCOMMUNITY_ACCESS.CLASS_SEARCH.GBL&PortalRegistryName=EMPLOYEE&PortalServletURI=https%3a%2f%2fcmshr.sjsu.edu%2fpsp%2fHSJPRDF%2f&PortalURI=https%3a%2f%2fcmshr.sjsu.edu%2fpsc%2fHSJPRDF%2f&PortalHostNode=HRMS&NoCrumbs=yes" -d "ICType=Panel&ICElementNum=0&ICStateNum=2&ICAction=CLASS_SRCH_WRK2_SSR_PB_SRCH%2457%24&ICXPos=0&ICYPos=0&ICFocus=&ICSaveWarningFilter=0&ICChanged=-1&ICResubmit=0&ICSID=HpLTZLhQFp4p&CLASS_SRCH_WRK2_INSTITUTION%2445%24=SJ000&CLASS_SRCH_STRM1=2102&CLASS_SRCH_WRK2_SSR_CLS_SRCH_TYPE%2458%24=06&CLASS_SRCH_WRK2_SSR_CLS_SRCH_TYPE%2458%24%24rad=06" -L "https://cmshr.sjsu.edu/psc/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL" 

#get the page with the search class menu... it's a post 
curl -v -A "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.0.11) Gecko/2009061118 Fedora/3.0.11-1.fc9 Firefox/3.0.11" --cookie lcookie.lwp --output "ctest6.dat" -e "Referer: https://cmshr.sjsu.edu/psc/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL" -d "ICType=Panel&ICElementNum=0&ICStateNum=2&ICAction=CLASS_SRCH_WRK2_SSR_PB_SRCH%2457%24&ICXPos=0&ICYPos=0&ICFocus=&ICSaveWarningFilter=0&ICChanged=-1&ICResubmit=0&ICSID=HpLTZLhQFp4p&CLASS_SRCH_WRK2_INSTITUTION%2445%24=SJ000&CLASS_SRCH_STRM1=2102&CLASS_SRCH_WRK2_SSR_CLS_SRCH_TYPE%2458%24=06&CLASS_SRCH_WRK2_SSR_CLS_SRCH_TYPE%2458%24%24rad=06" -L "https://cmshr.sjsu.edu/psc/HSJPRDF/EMPLOYEE/HSJPRD/c/COMMUNITY_ACCESS.CLASS_SEARCH.GBL" 

Répondre

0

Notez que --cookie-jar uniquement est pour l'enregistrement de cookies. Vous devez utiliser --cookie avec le même fichier que l'argument si vous voulez lire dans le fichier cookie (comme dans vos deux premières lignes courbes).