Document of SuperMiner

This is the module made for super web miner which enables everyone to download a large quantity of things from certain website

Elements in SuperMiner.py

Classes:

Functions:

Detailed information

Parameters ( {Id,Name,Class,Link,Partial_link,Tag,Xpath,CSS} are all parameters for function: Objects() )

ParametersValueNote
browser'chrome'Only support chrome currently
headlessFalse/TrueHide or display browser GUI,False defaultly.
urlAny url you'd like to inputhttps://www.bing.com defaultly.
Id'None'/OthersId,Name,Class,Link,Partial_link,Tag,Xpath,CSS: Only one element need to be given,if more than one specified, only find the first element. All elements are 'None' defaultly.
Name'None'/OthersId,Name,Class,Link,Partial_link,Tag,Xpath,CSS: Only one element need to be given,if more than one specified, only find the first element. All elements are 'None' defaultly.
Class'None'/OthersId,Name,Class,Link,Partial_link,Tag,Xpath,CSS: Only one element need to be given,if more than one specified, only find the first element. All elements are 'None' defaultly.
Link'None'/OthersId,Name,Class,Link,Partial_link,Tag,Xpath,CSS: Only one element need to be given,if more than one specified, only find the first element. All elements are 'None' defaultly.
Partial_link'None'/OthersId,Name,Class,Link,Partial_link,Tag,Xpath,CSS: Only one element need to be given,if more than one specified, only find the first element. All elements are 'None' defaultly.
Tag'None'/OthersId,Name,Class,Link,Partial_link,Tag,Xpath,CSS: Only one element need to be given,if more than one specified, only find the first element. All elements are 'None' defaultly.
Xpath'None'/OthersId,Name,Class,Link,Partial_link,Tag,Xpath,CSS: Only one element need to be given,if more than one specified, only find the first element. All elements are 'None' defaultly.
CSS'None'/OthersId,Name,Class,Link,Partial_link,Tag,Xpath,CSS: Only one element need to be given,if more than one specified, only find the first element. All elements are 'None' defaultly.

Inter members

MemberDiscirptionOperations availableOperations note
engineThe engine of the web minerengine.quit()Used to close the browser and end all threads

Inter functions

FunctionsReturn ValueDescription
MineEngine()-1:Error occurred 0:Run overInitialize the web miner engine, make class member 'engine' prepared
Objects()The list of the objects foundGet the objects to be found

Parameters

ParametersValue
AttributeThe attribute you want to locate, such as 'src'
Obj_listThe list of the objects you find

Return Value: The list of the the attributes you want to find

Parameters

ParametersValue
Obj_listThe 2-dimension list of the properties of the objects you find

Return Value: [[text,id,location,tag_name,size],...],[] means that error occured

Parameters

ParametersValueNote
enginethe engine object in the SuperMiner 
Obj_listThe list of the objects you find[] defaultly
Obj_indexThe index of the object in Obj_list, which means that you will acts on this objectIf -1 get, it means that all objects in Obj_list will be acted on
send_keysTrue/FalseTo/Not to send message/keyboard/mouse actions,False defaultly
messageThe text/keyboard/mouse action'Hello world' defaultly
clickTure/FalseFalse defaultly
clearTrue/FalseClear the text inputed, False defaultly
submitTrue/FalseSubmit the form, False defaultly
right_clickTrue/FalseFalse defaultly
double_clickTrue/FalseFalse defaultly
rollpageTrue/FalseScroll the page to get more information, False defaultly
roll_timesThe times to roll the pageOnly works when rollpage is True, 20 defaultly
time_sleepThe sleep time per page scrollingTo make sure the content loaded fully, 1s defaultly

Return Value: -1:Error occured; Others:Run over

Parameters

ParametersValueNote
enginethe miner engine 
url_listThe list of urls for objectsYou can get it from Attributes()
data_typeThe type of data you want to download'img'/'text'/'page' available,instead of 'text',page is recommanded now
file_typethe type of the file saving data'.html' default
folder_namethe name of the file folder saving data you downloaded'downloads' default
encodeencoding format of non-img files'utf-8' default
web_namewhether to name the file as the link of webTrue default
web_indexthe index of / in site link, the name is choosed from this / to the end of web link 

Return Value: -1:Error occured;Others:Run over