RankMass Crawler: A Crawler with High PageRank Coverage Guarantee.

Junghoo Cho; Uri Schonfeld
{'id': 'https://openalex.org/W14315322', 'doi': None, 'title': 'RankMass Crawler: A Crawler with High PageRank Coverage Guarantee.', 'display_name': 'RankMass Crawler: A Crawler with High PageRank Coverage Guarantee.', 'publication_year': 2007, 'publication_date': '2007-01-01', 'ids': {'openalex': 'https://openalex.org/W14315322', 'mag': '14315322'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://www.vldb.org/conf/2007/papers/research/p375-cho.pdf', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306421142', 'display_name': 'Very Large Data Bases', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'conference'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'proceedings-article', 'indexed_in': [], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5073859348', 'display_name': 'Junghoo Cho', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I161318765', 'display_name': 'University of California, Los Angeles', 'ror': 'https://ror.org/046rm7j60', 'country_code': 'US', 'type': 'education', 'lineage': ['https://openalex.org/I161318765']}], 'countries': ['US'], 'is_corresponding': False, 'raw_author_name': 'Junghoo Cho', 'raw_affiliation_strings': ['University of California–Los Angeles'], 'affiliations': [{'raw_affiliation_string': 'University of California–Los Angeles', 'institution_ids': ['https://openalex.org/I161318765']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5052102699', 'display_name': 'Uri Schonfeld', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Uri Schonfeld', 'raw_affiliation_strings': [], 'affiliations': []}], 'institution_assertions': [], 'countries_distinct_count': 1, 'institutions_distinct_count': 1, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 0.726, 'has_fulltext': False, 'cited_by_count': 13, 'citation_normalized_percentile': {'value': 0.702225, 'is_in_top_1_percent': False, 'is_in_top_10_percent': False}, 'cited_by_percentile_year': {'min': 85, 'max': 86}, 'biblio': {'volume': None, 'issue': None, 'first_page': '375', 'last_page': '386'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T12016', 'display_name': 'Web Data Extraction and Crawling Techniques', 'score': 0.9999, 'subfield': {'id': 'https://openalex.org/subfields/1710', 'display_name': 'Information Systems'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T12016', 'display_name': 'Web Data Extraction and Crawling Techniques', 'score': 0.9999, 'subfield': {'id': 'https://openalex.org/subfields/1710', 'display_name': 'Information Systems'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11478', 'display_name': 'Content-Centric Networking for Information Delivery', 'score': 0.9972, 'subfield': {'id': 'https://openalex.org/subfields/1705', 'display_name': 'Computer Networks and Communications'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11269', 'display_name': 'Text Compression and Indexing Algorithms', 'score': 0.9896, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/web-crawler', 'display_name': 'Web crawler', 'score': 0.9358388}, {'id': 'https://openalex.org/keywords/crawling', 'display_name': 'Crawling', 'score': 0.7069509}, {'id': 'https://openalex.org/keywords/focused-crawler', 'display_name': 'Focused crawler', 'score': 0.7013806}, {'id': 'https://openalex.org/keywords/upload', 'display_name': 'Upload', 'score': 0.62666297}, {'id': 'https://openalex.org/keywords/web-crawling', 'display_name': 'Web Crawling', 'score': 0.620471}, {'id': 'https://openalex.org/keywords/web-data-extraction', 'display_name': 'Web Data Extraction', 'score': 0.564408}, {'id': 'https://openalex.org/keywords/caching', 'display_name': 'Caching', 'score': 0.553317}, {'id': 'https://openalex.org/keywords/page-segmentation', 'display_name': 'Page Segmentation', 'score': 0.543672}, {'id': 'https://openalex.org/keywords/edge-caching', 'display_name': 'Edge Caching', 'score': 0.509022}, {'id': 'https://openalex.org/keywords/download', 'display_name': 'Download', 'score': 0.47871494}, {'id': 'https://openalex.org/keywords/hits-algorithm', 'display_name': 'HITS algorithm', 'score': 0.41948608}, {'id': 'https://openalex.org/keywords/pagerank', 'display_name': 'PageRank', 'score': 0.4175672}], 'concepts': [{'id': 'https://openalex.org/C13743948', 'wikidata': 'https://www.wikidata.org/wiki/Q45842', 'display_name': 'Web crawler', 'level': 2, 'score': 0.9358388}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.8127856}, {'id': 'https://openalex.org/C21959979', 'wikidata': 'https://www.wikidata.org/wiki/Q36774', 'display_name': 'Web page', 'level': 2, 'score': 0.7690965}, {'id': 'https://openalex.org/C100368936', 'wikidata': 'https://www.wikidata.org/wiki/Q1411725', 'display_name': 'Crawling', 'level': 2, 'score': 0.7069509}, {'id': 'https://openalex.org/C73340581', 'wikidata': 'https://www.wikidata.org/wiki/Q5463958', 'display_name': 'Focused crawler', 'level': 5, 'score': 0.7013806}, {'id': 'https://openalex.org/C136764020', 'wikidata': 'https://www.wikidata.org/wiki/Q466', 'display_name': 'World Wide Web', 'level': 1, 'score': 0.6748271}, {'id': 'https://openalex.org/C521815418', 'wikidata': 'https://www.wikidata.org/wiki/Q4182287', 'display_name': 'Web search engine', 'level': 4, 'score': 0.6516242}, {'id': 'https://openalex.org/C71901391', 'wikidata': 'https://www.wikidata.org/wiki/Q7126699', 'display_name': 'Upload', 'level': 2, 'score': 0.62666297}, {'id': 'https://openalex.org/C23123220', 'wikidata': 'https://www.wikidata.org/wiki/Q816826', 'display_name': 'Information retrieval', 'level': 1, 'score': 0.57381487}, {'id': 'https://openalex.org/C173576120', 'wikidata': 'https://www.wikidata.org/wiki/Q2641220', 'display_name': 'Static web page', 'level': 4, 'score': 0.5703418}, {'id': 'https://openalex.org/C2780154274', 'wikidata': 'https://www.wikidata.org/wiki/Q7126717', 'display_name': 'Download', 'level': 2, 'score': 0.47871494}, {'id': 'https://openalex.org/C195409031', 'wikidata': 'https://www.wikidata.org/wiki/Q1031957', 'display_name': 'HITS algorithm', 'level': 5, 'score': 0.41948608}, {'id': 'https://openalex.org/C11392498', 'wikidata': 'https://www.wikidata.org/wiki/Q11288', 'display_name': 'Web server', 'level': 3, 'score': 0.4179381}, {'id': 'https://openalex.org/C2779172887', 'wikidata': 'https://www.wikidata.org/wiki/Q184316', 'display_name': 'PageRank', 'level': 2, 'score': 0.4175672}, {'id': 'https://openalex.org/C110875604', 'wikidata': 'https://www.wikidata.org/wiki/Q75', 'display_name': 'The Internet', 'level': 2, 'score': 0.26419872}, {'id': 'https://openalex.org/C79373723', 'wikidata': 'https://www.wikidata.org/wiki/Q386275', 'display_name': 'Web development', 'level': 3, 'score': 0.2535981}, {'id': 'https://openalex.org/C71924100', 'wikidata': 'https://www.wikidata.org/wiki/Q11190', 'display_name': 'Medicine', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C105702510', 'wikidata': 'https://www.wikidata.org/wiki/Q514', 'display_name': 'Anatomy', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://www.vldb.org/conf/2007/papers/research/p375-cho.pdf', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306421142', 'display_name': 'Very Large Data Bases', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'conference'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 26, 'referenced_works': ['https://openalex.org/W1506122440', 'https://openalex.org/W1845137714', 'https://openalex.org/W2007687650', 'https://openalex.org/W2014478203', 'https://openalex.org/W2017726337', 'https://openalex.org/W2029341294', 'https://openalex.org/W2030453570', 'https://openalex.org/W2066636486', 'https://openalex.org/W2069153192', 'https://openalex.org/W2113184419', 'https://openalex.org/W2117850397', 'https://openalex.org/W2118942057', 'https://openalex.org/W2119485457', 'https://openalex.org/W2120136770', 'https://openalex.org/W2124168655', 'https://openalex.org/W2124673015', 'https://openalex.org/W2128941908', 'https://openalex.org/W2132476584', 'https://openalex.org/W2151932833', 'https://openalex.org/W2158601853', 'https://openalex.org/W2161118554', 'https://openalex.org/W2170188121', 'https://openalex.org/W2170344111', 'https://openalex.org/W2293888960', 'https://openalex.org/W2295141584', 'https://openalex.org/W41073243'], 'related_works': ['https://openalex.org/W3205545366', 'https://openalex.org/W2997495867', 'https://openalex.org/W2920184028', 'https://openalex.org/W2899211245', 'https://openalex.org/W28427753', 'https://openalex.org/W2552722382', 'https://openalex.org/W2333562347', 'https://openalex.org/W2323049050', 'https://openalex.org/W2185961408', 'https://openalex.org/W2185053453', 'https://openalex.org/W2184149344', 'https://openalex.org/W2182042776', 'https://openalex.org/W2128919237', 'https://openalex.org/W2105302463', 'https://openalex.org/W2066309116', 'https://openalex.org/W2064214115', 'https://openalex.org/W2030453570', 'https://openalex.org/W2029341294', 'https://openalex.org/W1587535619', 'https://openalex.org/W14800140'], 'abstract_inverted_index': {'Crawling': [0], 'algorithms': [1, 87, 147], 'have': [2], 'been': [3], 'the': [4, 20, 27, 34, 43, 55, 98, 102, 128, 133, 138, 143, 163, 186], 'subject': [5], 'of': [6, 23, 54, 85, 97, 101, 112, 137, 145, 185, 193], 'extensive': [7], 'research': [8], 'and': [9, 114, 151, 181], 'optimizations,': [10], 'but': [11], 'some': [12], 'important': [13, 66, 121, 135, 177], 'questions': [14, 80], 'remain': [15], 'open.': [16], 'In': [17, 71], 'particular,': [18], 'given': [19], 'infinite': [21], 'number': [22, 111, 192], 'pages': [24, 47, 113, 122, 178], 'available': [25], 'on': [26, 94, 157, 180], 'Web,': [28], 'search-engine': [29], 'operators': [30], 'constantly': [31], 'struggle': [32], 'with': [33, 188], 'following': [35], 'vexing': [36], 'questions:': [37], 'When': [38], 'can': [39, 58, 131], 'I': [40, 49, 59, 61, 69], 'stop': [41], 'downloading': [42, 176], 'Web?': [44, 56], 'How': [45, 57], 'many': [46], 'should': [48], 'download': [50, 106], 'to': [51, 78, 120], 'cover': [52], '“most”': [53], 'know': [60], 'am': [62], 'not': [63], 'missing': [64], 'an': [65, 76], 'part': [67, 100, 136], 'when': [68], 'stop?': [70], 'this': [72], 'paper': [73], 'we': [74], 'provide': [75, 90], 'answer': [77], 'these': [79], 'by': [81, 148], 'developing': [82], 'a': [83, 91, 109, 117, 124, 189], 'family': [84], 'crawling': [86, 108], 'that': [88, 127, 168], '(1)': [89], 'theoretical': [92, 149], 'guarantee': [93], 'how': [95], 'much': [96], '“important”': [99], 'Web': [103, 139, 187], 'it': [104], 'will': [105], 'after': [107], 'certain': [110], '(2)': [115], 'give': [116], 'high': [118, 183], 'priority': [119], 'during': [123], 'crawl,': [125], 'so': [126], 'search': [129], 'engine': [130], 'index': [132], 'most': [134], 'first.': [140], 'We': [141], 'prove': [142], 'correctness': [144], 'our': [146, 170], 'analysis': [150], 'evaluate': [152], 'their': [153], 'performance': [154], 'experimentally': [155], 'based': [156], '141': [158], 'million': [159], 'URLs': [160], 'obtained': [161], 'from': [162], 'Web.': [164], 'Our': [165], 'experiments': [166], 'demonstrate': [167], 'even': [169], 'simple': [171], 'algorithm': [172], 'is': [173], 'effective': [174], 'in': [175], 'early': [179], 'provides': [182], '“coverage”': [184], 'relatively': [190], 'small': [191], 'pages.': [194]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W14315322', 'counts_by_year': [{'year': 2018, 'cited_by_count': 1}, {'year': 2017, 'cited_by_count': 1}, {'year': 2015, 'cited_by_count': 1}, {'year': 2013, 'cited_by_count': 2}, {'year': 2012, 'cited_by_count': 3}], 'updated_date': '2024-09-19T10:31:04.899861', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works