Reinforcement Learning for POMDP Using State Classification.

Le Dung; Takashi Komeda; Motoki Takagi
{'id': 'https://openalex.org/W2173087131', 'doi': None, 'title': 'Reinforcement Learning for POMDP Using State Classification.', 'display_name': 'Reinforcement Learning for POMDP Using State Classification.', 'publication_year': 2007, 'publication_date': '2007-01-01', 'ids': {'openalex': 'https://openalex.org/W2173087131', 'mag': '2173087131'}, 'language': 'en', 'primary_location': {'is_oa': False, 'landing_page_url': 'https://dblp.uni-trier.de/db/conf/mlmta/mlmta2007.html#LeKT07', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306520084', 'display_name': 'MLMTA', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}, 'type': 'article', 'type_crossref': 'journal-article', 'indexed_in': [], 'open_access': {'is_oa': False, 'oa_status': 'closed', 'oa_url': None, 'any_repository_has_fulltext': False}, 'authorships': [{'author_position': 'first', 'author': {'id': 'https://openalex.org/A5103968393', 'display_name': 'Le Dung', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I171481255', 'display_name': 'Shibaura Institute of Technology', 'ror': 'https://ror.org/020wjcq07', 'country_code': 'JP', 'type': 'education', 'lineage': ['https://openalex.org/I171481255']}], 'countries': ['JP'], 'is_corresponding': False, 'raw_author_name': 'Le Tien Dung', 'raw_affiliation_strings': ['Graduate School of Engineering, Shibaura Institute of Technology, Saitama, Japan'], 'affiliations': [{'raw_affiliation_string': 'Graduate School of Engineering, Shibaura Institute of Technology, Saitama, Japan', 'institution_ids': ['https://openalex.org/I171481255']}]}, {'author_position': 'middle', 'author': {'id': 'https://openalex.org/A5026428065', 'display_name': 'Takashi Komeda', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I171481255', 'display_name': 'Shibaura Institute of Technology', 'ror': 'https://ror.org/020wjcq07', 'country_code': 'JP', 'type': 'education', 'lineage': ['https://openalex.org/I171481255']}], 'countries': ['JP'], 'is_corresponding': False, 'raw_author_name': 'Takashi Komeda', 'raw_affiliation_strings': ['Faculty of System Engineering, Shibaura Institute of Technology, Saitama, Japan'], 'affiliations': [{'raw_affiliation_string': 'Faculty of System Engineering, Shibaura Institute of Technology, Saitama, Japan', 'institution_ids': ['https://openalex.org/I171481255']}]}, {'author_position': 'last', 'author': {'id': 'https://openalex.org/A5105359615', 'display_name': 'Motoki Takagi', 'orcid': None}, 'institutions': [{'id': 'https://openalex.org/I171481255', 'display_name': 'Shibaura Institute of Technology', 'ror': 'https://ror.org/020wjcq07', 'country_code': 'JP', 'type': 'education', 'lineage': ['https://openalex.org/I171481255']}], 'countries': ['JP'], 'is_corresponding': False, 'raw_author_name': 'Motoki Takagi', 'raw_affiliation_strings': ['Faculty of System Engineering, Shibaura Institute of Technology, Saitama, Japan'], 'affiliations': [{'raw_affiliation_string': 'Faculty of System Engineering, Shibaura Institute of Technology, Saitama, Japan', 'institution_ids': ['https://openalex.org/I171481255']}]}], 'countries_distinct_count': 1, 'institutions_distinct_count': 1, 'corresponding_author_ids': [], 'corresponding_institution_ids': [], 'apc_list': None, 'apc_paid': None, 'fwci': 0.0, 'has_fulltext': False, 'cited_by_count': 2, 'citation_normalized_percentile': {'value': 0.617429, 'is_in_top_1_percent': False, 'is_in_top_10_percent': False}, 'cited_by_percentile_year': {'min': 70, 'max': 74}, 'biblio': {'volume': None, 'issue': None, 'first_page': '45', 'last_page': '51'}, 'is_retracted': False, 'is_paratext': False, 'primary_topic': {'id': 'https://openalex.org/T10320', 'display_name': 'Neural Network Fundamentals and Applications', 'score': 0.8504, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, 'topics': [{'id': 'https://openalex.org/T10320', 'display_name': 'Neural Network Fundamentals and Applications', 'score': 0.8504, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T10462', 'display_name': 'Reinforcement Learning Algorithms', 'score': 0.8421, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}, {'id': 'https://openalex.org/T12761', 'display_name': 'Adaptation to Concept Drift in Data Streams', 'score': 0.828, 'subfield': {'id': 'https://openalex.org/subfields/1702', 'display_name': 'Artificial Intelligence'}, 'field': {'id': 'https://openalex.org/fields/17', 'display_name': 'Computer Science'}, 'domain': {'id': 'https://openalex.org/domains/3', 'display_name': 'Physical Sciences'}}], 'keywords': [{'id': 'https://openalex.org/keywords/reinforcement-learning', 'display_name': 'Reinforcement Learning', 'score': 0.664893}, {'id': 'https://openalex.org/keywords/q-learning', 'display_name': 'Q-learning', 'score': 0.6419061}, {'id': 'https://openalex.org/keywords/backpropagation-learning', 'display_name': 'Backpropagation Learning', 'score': 0.559116}, {'id': 'https://openalex.org/keywords/incremental-learning', 'display_name': 'Incremental Learning', 'score': 0.548417}, {'id': 'https://openalex.org/keywords/ensemble-learning', 'display_name': 'Ensemble Learning', 'score': 0.53403}, {'id': 'https://openalex.org/keywords/online-learning', 'display_name': 'Online Learning', 'score': 0.513234}], 'concepts': [{'id': 'https://openalex.org/C17098449', 'wikidata': 'https://www.wikidata.org/wiki/Q176814', 'display_name': 'Partially observable Markov decision process', 'level': 4, 'score': 0.89325166}, {'id': 'https://openalex.org/C97541855', 'wikidata': 'https://www.wikidata.org/wiki/Q830687', 'display_name': 'Reinforcement learning', 'level': 2, 'score': 0.8543552}, {'id': 'https://openalex.org/C106189395', 'wikidata': 'https://www.wikidata.org/wiki/Q176789', 'display_name': 'Markov decision process', 'level': 3, 'score': 0.8251804}, {'id': 'https://openalex.org/C32848918', 'wikidata': 'https://www.wikidata.org/wiki/Q845789', 'display_name': 'Observable', 'level': 2, 'score': 0.678807}, {'id': 'https://openalex.org/C41008148', 'wikidata': 'https://www.wikidata.org/wiki/Q21198', 'display_name': 'Computer science', 'level': 0, 'score': 0.6785233}, {'id': 'https://openalex.org/C188116033', 'wikidata': 'https://www.wikidata.org/wiki/Q2664563', 'display_name': 'Q-learning', 'level': 3, 'score': 0.6419061}, {'id': 'https://openalex.org/C147168706', 'wikidata': 'https://www.wikidata.org/wiki/Q1457734', 'display_name': 'Recurrent neural network', 'level': 3, 'score': 0.60601056}, {'id': 'https://openalex.org/C72434380', 'wikidata': 'https://www.wikidata.org/wiki/Q230930', 'display_name': 'State space', 'level': 2, 'score': 0.60206497}, {'id': 'https://openalex.org/C154945302', 'wikidata': 'https://www.wikidata.org/wiki/Q11660', 'display_name': 'Artificial intelligence', 'level': 1, 'score': 0.5673264}, {'id': 'https://openalex.org/C48103436', 'wikidata': 'https://www.wikidata.org/wiki/Q599031', 'display_name': 'State (computer science)', 'level': 2, 'score': 0.52878386}, {'id': 'https://openalex.org/C159886148', 'wikidata': 'https://www.wikidata.org/wiki/Q176645', 'display_name': 'Markov process', 'level': 2, 'score': 0.49198672}, {'id': 'https://openalex.org/C98763669', 'wikidata': 'https://www.wikidata.org/wiki/Q176645', 'display_name': 'Markov chain', 'level': 2, 'score': 0.38634917}, {'id': 'https://openalex.org/C119857082', 'wikidata': 'https://www.wikidata.org/wiki/Q2539', 'display_name': 'Machine learning', 'level': 1, 'score': 0.38268358}, {'id': 'https://openalex.org/C126255220', 'wikidata': 'https://www.wikidata.org/wiki/Q141495', 'display_name': 'Mathematical optimization', 'level': 1, 'score': 0.33096662}, {'id': 'https://openalex.org/C163836022', 'wikidata': 'https://www.wikidata.org/wiki/Q6771326', 'display_name': 'Markov model', 'level': 3, 'score': 0.283513}, {'id': 'https://openalex.org/C50644808', 'wikidata': 'https://www.wikidata.org/wiki/Q192776', 'display_name': 'Artificial neural network', 'level': 2, 'score': 0.26875043}, {'id': 'https://openalex.org/C33923547', 'wikidata': 'https://www.wikidata.org/wiki/Q395', 'display_name': 'Mathematics', 'level': 0, 'score': 0.19758141}, {'id': 'https://openalex.org/C11413529', 'wikidata': 'https://www.wikidata.org/wiki/Q8366', 'display_name': 'Algorithm', 'level': 1, 'score': 0.16286016}, {'id': 'https://openalex.org/C105795698', 'wikidata': 'https://www.wikidata.org/wiki/Q12483', 'display_name': 'Statistics', 'level': 1, 'score': 0.080783695}, {'id': 'https://openalex.org/C121332964', 'wikidata': 'https://www.wikidata.org/wiki/Q413', 'display_name': 'Physics', 'level': 0, 'score': 0.0}, {'id': 'https://openalex.org/C62520636', 'wikidata': 'https://www.wikidata.org/wiki/Q944', 'display_name': 'Quantum mechanics', 'level': 1, 'score': 0.0}], 'mesh': [], 'locations_count': 1, 'locations': [{'is_oa': False, 'landing_page_url': 'https://dblp.uni-trier.de/db/conf/mlmta/mlmta2007.html#LeKT07', 'pdf_url': None, 'source': {'id': 'https://openalex.org/S4306520084', 'display_name': 'MLMTA', 'issn_l': None, 'issn': None, 'is_oa': False, 'is_in_doaj': False, 'is_core': True, 'host_organization': None, 'host_organization_name': None, 'host_organization_lineage': [], 'host_organization_lineage_names': [], 'type': 'journal'}, 'license': None, 'license_id': None, 'version': None, 'is_accepted': False, 'is_published': False}], 'best_oa_location': None, 'sustainable_development_goals': [{'id': 'https://metadata.un.org/sdg/16', 'score': 0.79, 'display_name': 'Peace, justice, and strong institutions'}], 'grants': [], 'datasets': [], 'versions': [], 'referenced_works_count': 0, 'referenced_works': [], 'related_works': ['https://openalex.org/W47969833', 'https://openalex.org/W3131311588', 'https://openalex.org/W3110191413', 'https://openalex.org/W2995650662', 'https://openalex.org/W2963254349', 'https://openalex.org/W2945505736', 'https://openalex.org/W2925153779', 'https://openalex.org/W2802349643', 'https://openalex.org/W2611988551', 'https://openalex.org/W2381909226', 'https://openalex.org/W2144794447', 'https://openalex.org/W2133130433', 'https://openalex.org/W2057054336', 'https://openalex.org/W1997350370', 'https://openalex.org/W1988667012', 'https://openalex.org/W1980648727', 'https://openalex.org/W171984154', 'https://openalex.org/W1541084404', 'https://openalex.org/W1539054658', 'https://openalex.org/W131709709'], 'abstract_inverted_index': {'Reinforcement': [0], 'learning': [1, 17, 46, 74, 144], '(RL)': [2], 'has': [3], 'been': [4], 'widely': [5], 'used': [6, 40, 103, 115], 'to': [7, 41, 64, 104, 116, 138, 147], 'solve': [8, 19], 'problems': [9, 50, 129], 'with': [10, 142], 'a': [11, 33, 57, 66, 72, 98, 140, 152], 'little': [12], 'feedback': [13], 'from': [14], 'environment.': [15], 'Q': [16, 43, 99], 'can': [18, 38], 'Markov': [20, 29], 'decision': [21, 30], 'processes': [22, 31], '(MDPs)': [23], 'quite': [24], 'well.': [25], 'For': [26], 'partially': [27], 'observable': [28, 90, 109], '(POMDPs),': [32], 'recurrent': [34], 'neural': [35], 'network': [36], '(RNN)': [37], 'be': [39], 'approximate': [42, 117], 'values.': [44], 'However,': [45], 'time': [47], 'for': [48, 69, 119], 'these': [49], 'is': [51, 84, 102, 114], 'typically': [52], 'very': [53], 'long.': [54], 'We': [55], 'present': [56], 'new': [58], 'combination': [59], 'of': [60, 107, 123], 'RL': [61], 'and': [62, 93, 111], 'RNN': [63, 113], 'find': [65], 'good': [67], 'policy': [68, 141], 'POMDPs': [70], 'in': [71, 125], 'shorter': [73], 'time.': [75], 'This': [76], 'method': [77, 134, 149], 'contains': [78], 'two': [79, 87, 126], 'phases:': [80], 'firstly,': [81], 'state': [82, 91, 95], 'space': [83], 'divided': [85], 'into': [86], 'groups': [88], '(fully': [89], 'group': [92], 'hidden': [94, 120], 'group);': [96], 'secondly,': [97], 'value': [100], 'table': [101], 'store': [105], 'values': [106, 118], 'fully': [108], 'states': [110], 'an': [112, 136], 'states.': [121], 'Results': [122], 'experiments': [124], 'grid': [127], 'world': [128], 'show': [130], 'that': [131], 'the': [132, 148], 'proposed': [133], 'enables': [135], 'agent': [137], 'acquire': [139], 'better': [143], 'performance': [145], 'compared': [146], 'using': [150], 'only': [151], 'RNN.': [153]}, 'cited_by_api_url': 'https://api.openalex.org/works?filter=cites:W2173087131', 'counts_by_year': [{'year': 2014, 'cited_by_count': 1}, {'year': 2013, 'cited_by_count': 1}], 'updated_date': '2024-08-31T10:19:41.837230', 'created_date': '2016-06-24'}
Publication Information

Basic Information

Access and Citation

AI Researcher Chatbot

Primary Location

Authors

Topics

Keywords

Related Works