登录页面[url1]:
https://investorservice.cfmmc.com/

验证码[url2]:
https://investorservice.cfmmc.com/veriCode.do

提交登录[url3]:
https://investorservice.cfmmc.com/login.do

因为有验证码,而且这验证码用程序自动识别难度较大,最后一位与背景色太接近,识别不出来,只能人肉提交了。

该网站特点:
1.使用了https
2.SESSIONID存在COOKIES中
3.访问验证码页不会生成COOKIES,在登录之前能取到COOKIES的,只有访问url1了。
处理方法:
第一步:
使用模拟get方式,访问url1,取得cookies

    CookieContainer cookies = new CookieContainer();
    
    string url = "https://investorservice.cfmmc.com/";
    HttpWebRequest myHttpWebRequest = (HttpWebRequest)WebRequest.Create(url);
    myHttpWebRequest.Timeout = 20 * 1000; //连接超时
    myHttpWebRequest.Accept = "*/*";
    myHttpWebRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0;)";
    myHttpWebRequest.CookieContainer = new CookieContainer(); //暂存到新实例
    myHttpWebRequest.GetResponse().Close();
    cookies = myHttpWebRequest.CookieContainer; //保存cookies
    string cookiesstr = myHttpWebRequest.CookieContainer.GetCookieHeader(myHttpWebRequest.RequestUri); //把cookies转换成字符串

第二步:
使用模拟get方式,访问url2,并把验证码保存到本地,在模拟get方式时,要注意的是,把第一步得到的cookies也提交上去,要不然会和登录时的用户对不上,那么验证码也会验证失败,代码如下:

    url = "https://investorservice.cfmmc.com/veriCode.do";
    myHttpWebRequest = (HttpWebRequest)WebRequest.Create(url);
    myHttpWebRequest.Timeout = 20 * 1000; //连接超时
    myHttpWebRequest.Accept = "*/*";
    myHttpWebRequest.UserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0;)";
    myHttpWebRequest.CookieContainer = new CookieContainer(); //暂存到新实例
    myHttpWebRequest.Method = "get";
    myHttpWebRequest.CookieContainer = cookies;
    HttpWebResponse myHttpWebResponse = (HttpWebResponse)myHttpWebRequest.GetResponse();
    Stream stream = myHttpWebResponse.GetResponseStream();
    FileStream writer = new FileStream(System.Web.HttpContext.Current.Server.MapPath("\\tmp\\vericode.jpg"), FileMode.OpenOrCreate, FileAccess.Write);
    byte[] buff = new byte[512];
    int c = 0; //实际读取的字节数
    while ((c = stream.Read(buff, 0, buff.Length)) > 0)
    {
        writer.Write(buff, 0, c);
    }
    writer.Close();
    writer.Dispose();
    myHttpWebRequest.GetResponse().Close();
        

第三步:
本地用户人肉把下载下来的验证码填写后,模拟post提交到url3
需要提交的内容包括:用户名、密码、验证码、cookies
需要注意的是,这里是https,代码如下:

    System.GC.Collect();//垃圾回收,回收没有正常关闭的http连接
    string result = "";//返回结果
    int timeout = 30;
    string charset = "utf-8";
    HttpWebRequest request = null;
    HttpWebResponse response = null;
    Stream reqStream = null;
    try
    {
        //设置最大连接数
        ServicePointManager.DefaultConnectionLimit = 200;
        //设置https验证方式
        if (url.StartsWith("https", StringComparison.OrdinalIgnoreCase))
        {
            ServicePointManager.ServerCertificateValidationCallback =
                    new RemoteCertificateValidationCallback(CheckValidationResult);
        }
        /***************************************************************
        * 下面设置HttpWebRequest的相关属性
        * ************************************************************/
        request = (HttpWebRequest)WebRequest.Create(url);
        request.Method = "POST";
        request.Timeout = timeout * 1000;
        ////设置代理服务器
        //WebProxy proxy = new WebProxy();                          //定义一个网关对象
        //proxy.Address = new Uri(WxPayConfig.PROXY_URL);              //网关服务器端口:端口
        //request.Proxy = proxy;
        //设置POST的数据类型和长度
        request.ContentType = string.Format("application/x-www-form-urlencoded;charset={0}", charset);
        byte[] res = System.Text.Encoding.GetEncoding(charset).GetBytes(data);
        request.ContentLength = res.Length;
        CookieContainer cc = new CookieContainer();
        string[] arr_cookies = cookies.Split(';');
        for (int i = 0; i < arr_cookies.Length; i++)
        {
            string[] arr_item = arr_cookies[i].Split('=');
            cc.Add(new Uri(url), new Cookie(arr_item[0].Trim(), arr_item[1].Trim()));
        }
        request.CookieContainer = cc;
        //往服务器写入数据
        reqStream = request.GetRequestStream();
        reqStream.Write(res, 0, res.Length);
        reqStream.Close();
        //获取服务端返回
        response = (HttpWebResponse)request.GetResponse();
        //获取服务端返回数据
        StreamReader sr = new StreamReader(response.GetResponseStream(), System.Text.Encoding.GetEncoding(charset));
        result = sr.ReadToEnd().Trim();
        sr.Close();
    }
    catch (Exception e)
    {                
    }
    finally
    {
        //关闭连接和流
        if (response != null)
        {
            response.Close();
        }
        if (request != null)
        {
            request.Abort();
        }
    }
    

之后的result 就是证监会保证金网站给返回的登录之后的页面,想要抓取数据的话,直接处理返回的这信息就可以了。

以上。

Last modification:December 18, 2015
如果觉得我的文章对你有用,请随意赞赏