N-gram Based Two-Step Algorithm for Word Segmentation

Dong-hee Lim; Kyu‐Baek Hwang; Seung-Shik Kang
{'id': 'https://openalex.org/W7593531', 'doi': None, 'title': 'N-gram Based Two-Step Algorithm for Word Segmentation', 'display_name': 'N-gram Based Two-Step Algorithm for Word Segmentation', 'publication_year': 2006, 'publication_date': '2006-07-01', 'ids': {'openalex': 'https://openalex.org/W7593531', 'mag': '7593531'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://aclanthology.org/W06-0136/', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306420508', 'display_name': 'Meeting of the Association for Computational Linguistics', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'conference'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'proceedings-article', 'indexed_in': [], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5056313571', 'display_name': 'Dong-hee Lim', 'orcid': None}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Dong-Hee Lim', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5048469651', 'display_name': 'Kyu‐Baek Hwang', 'orcid': 'https://orcid.org/0000-0003-2652-5326'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Kyu-Baek Hwang', 'raw_affiliation_strings': [], 'affiliations': []}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5017037508', 'display_name': 'Seung-Shik Kang', 'orcid': 'https://orcid.org/0000-0003-3318-6326'}, 'institutions': [], 'countries': [], 'is_corresponding': False, 'raw_author_name': 'Seung-Shik Kang', 'raw_affiliation_strings': [], 'affiliations': []}], 'institution_assertions': [], 'countries_distinct_count': 0, 'institutions_distinct_count': 0, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 0.0, 'has_fulltext': False, 'cited_by_count': 0, 'citation_normalized_percentile': {'value': 0.0, 'is_in_top_1_percent': False, 'is_in_top_10_percent': False}, 'cited_by_percentile_year': {'min': 0, 'max': 61}, 'biblio': {'volume': None, 'issue': None, 'first_page': '197', 'last_page': '200'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10181', 'display_name': 'Statistical Machine Translation and Natural Language Processing', 'score': 0.9996, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10181', 'display_name': 'Statistical Machine Translation and Natural Language Processing', 'score': 0.9996, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10601', 'display_name': 'Handwriting Recognition and Text Detection', 'score': 0.9901, 'subfield': {'id': 'https://openalex.org/subfields/1707', 'display_name': 'Computer Vision and Pattern Recognition'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T11269', 'display_name': 'Text Compression and Indexing Algorithms', 'score': 0.9877, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/bigram', 'display_name': 'Bigram', 'score': 0.98737663}, {'id': 'https://openalex.org/keywords/n-gram', 'display_name': 'n-gram', 'score': 0.8142789}, {'id': 'https://openalex.org/keywords/smoothing', 'display_name': 'Smoothing', 'score': 0.69373965}, {'id': 'https://openalex.org/keywords/text-segmentation', 'display_name': 'Text segmentation', 'score': 0.5535397}, {'id': 'https://openalex.org/keywords/text-detection', 'display_name': 'Text Detection', 'score': 0.478417}], 'concepts': [{'id': 'https://openalex.org/C108757681', 'wikidata': 'https://www.wikidata.org/wiki/Q2773912', 'display_name': 'Bigram', 'level': 3, 'score': 0.98737663}, {'id': 'https://openalex.org/C137546455', 'wikidata': 'https://www.wikidata.org/wiki/Q3213474', 'display_name': 'Trigram', 'level': 2, 'score': 0.9743248}, {'id': 'https://openalex.org/C117884012', 'wikidata': 'https://www.wikidata.org/wiki/Q94489', 'display_name': 'n-gram', 'level': 3, 'score': 0.8142789}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.7214365}, {'id': 'https://openalex.org/C3770464', 'wikidata': 'https://www.wikidata.org/wiki/Q775963', 'display_name': 'Smoothing', 'level': 2, 'score': 0.69373965}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.5922035}, {'id': 'https://openalex.org/C98501671', 'wikidata': 'https://www.wikidata.org/wiki/Q1948408', 'display_name': 'Text segmentation', 'level': 3, 'score': 0.5535397}, {'id': 'https://openalex.org/C89600930', 'wikidata': 'https://www.wikidata.org/wiki/Q1423946', 'display_name': 'Segmentation', 'level': 2, 'score': 0.5504176}, {'id': 'https://openalex.org/C90805587', 'wikidata': 'https://www.wikidata.org/wiki/Q10944557', 'display_name': 'Word (group theory)', 'level': 2, 'score': 0.5351231}, {'id': 'https://openalex.org/C28490314', 'wikidata': 'https://www.wikidata.org/wiki/Q189436', 'display_name': 'Speech recognition', 'level': 1, 'score': 0.50265265}, {'id': 'https://openalex.org/C153180895', 'wikidata': 'https://www.wikidata.org/wiki/Q7148389', 'display_name': 'Pattern recognition (psychology)', 'level': 2, 'score': 0.4905433}, {'id': 'https://openalex.org/C11413529', 'wikidata': 'https://www.wikidata.org/wiki/Q8366', 'display_name': 'Algorithm', 'level': 1, 'score': 0.34670395}, {'id': 'https://openalex.org/C137293760', 'wikidata': 'https://www.wikidata.org/wiki/Q3621696', 'display_name': 'Language model', 'level': 2, 'score': 0.22367302}, {'id': 'https://openalex.org/C33923547', 'wikidata': 'https://www.wikidata.org/wiki/Q395', 'display_name': 'Mathematics', 'level': 0, 'score': 0.19479692}, {'id': 'https://openalex.org/C31972630', 'wikidata': 'https://www.wikidata.org/wiki/Q844240', 'display_name': 'Computer vision', 'level': 1, 'score': 0.12949419}, {'id': 'https://openalex.org/C2524010', 'wikidata': 'https://www.wikidata.org/wiki/Q8087', 'display_name': 'Geometry', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://aclanthology.org/W06-0136/', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306420508', 'display_name': 'Meeting of the Association for Computational Linguistics', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': False, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'conference'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 11, 'referenced_works': ['https://openalex.org/W111205082', 'https://openalex.org/W1575907248', 'https://openalex.org/W1966133627', 'https://openalex.org/W1967084801', 'https://openalex.org/W2033295622', 'https://openalex.org/W2046199218', 'https://openalex.org/W2108220507', 'https://openalex.org/W2121955615', 'https://openalex.org/W2142263282', 'https://openalex.org/W2474646533', 'https://openalex.org/W2786108473'], 'related_works': ['https://openalex.org/W562266943', 'https://openalex.org/W3177823748', 'https://openalex.org/W2998666948', 'https://openalex.org/W2384827700', 'https://openalex.org/W2381363867', 'https://openalex.org/W2372421644', 'https://openalex.org/W2370461081', 'https://openalex.org/W2369209412', 'https://openalex.org/W2361578688', 'https://openalex.org/W2200039557', 'https://openalex.org/W2132551375', 'https://openalex.org/W2103407879', 'https://openalex.org/W2098386248', 'https://openalex.org/W2096096861', 'https://openalex.org/W2037164444', 'https://openalex.org/W1979815407', 'https://openalex.org/W1967084801', 'https://openalex.org/W1908323313', 'https://openalex.org/W154319670', 'https://openalex.org/W1484884195'], 'abstract_inverted_index': {'This': [0], 'paper': [1], 'describes': [2], 'an': [3], 'n-gram': [4, 23, 49], 'based': [5, 93], 'reinforcement': [6], 'approach': [7], 'to': [8, 102], 'the': [9, 16, 33, 48, 52, 61, 72, 83, 95, 104, 115], 'closed': [10], 'track': [11], 'of': [12, 25, 108], 'word': [13, 19], 'segmentation': [14, 20, 89], 'in': [15, 82], 'third': [17], 'Chinese': [18], 'bakeoff.': [21], 'Character': [22], 'features': [24], 'unigram,': [26], 'bigram,': [27], 'and': [28, 36, 87, 97], 'trigram': [29, 98], 'are': [30, 39, 58, 68, 79, 90], 'extracted': [31], 'from': [32], 'training': [34], 'corpus': [35], 'its': [37], 'frequencies': [38], 'counted.': [40], 'We': [41], 'investigated': [42], 'a': [43], 'step-by-step': [44], 'methodology': [45], 'by': [46, 60, 70], 'using': [47], 'statistics.': [50, 99], 'In': [51, 100], 'first': [53, 84], 'step,': [54], 'relatively': [55], 'definite': [56], 'segmentations': [57], 'fixed': [59, 81], 'tight': [62], 'threshold': [63], 'value.': [64], 'The': [65], 'remaining': [66], 'tags': [67, 77], 'decided': [69], 'considering': [71], 'left': [73], 'or': [74], 'right': [75], 'space': [76], 'that': [78], 'already': [80], 'step.': [85], 'Definite': [86], 'loose': [88], 'performed': [91], 'simply': [92], 'on': [94], 'bigram': [96, 109], 'order': [101], 'overcome': [103], 'data': [105], 'sparseness': [106], 'problem': [107], 'data,': [110], 'unigram': [111], 'is': [112], 'used': [113], 'for': [114], 'smoothing.': [116]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W7593531', 'counts_by_year': [], 'updated_date': '2024-09-19T14:45:12.677286', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works