机器之心|一行代码不用写，就可以训练、测试、使用模型，这个项目帮你做到

机器之心报道
机器之心编辑部
igel 是 GitHub 上的一个热门工具，基于 scikit-learn 构建，支持 sklearn 的所有机器学习功能，如回归、分类和聚类。用户无需编写一行代码即可使用机器学习模型，只要有 yaml 或 json 文件，来描述你想做什么即可。

本文插图

一行代码不用写，就可以训练、测试和使用模型，还有这样的好事？
最近，软件工程师 Nidhal Baccouri 就在 GitHub 上开源了一个这样的机器学习工具——igel ，并登上了 GitHub 热榜。目前，该项目 star 量已有 1.5k 。
项目地址：https://github.com/nidhaloff/igel
该项目旨在为每一个人（包括技术和非技术人员）提供使用机器学习的便捷方式。
项目作者这样描述创建 igel 的动机：「有时候我需要一个用来快速创建机器学习原型的工具，不管是进行概念验证还是创建快速 draft 模型。我发现自己经常为写样板代码或思考如何开始而犯愁。于是我决定创建 igel 。」
【机器之心|一行代码不用写，就可以训练、测试、使用模型，这个项目帮你做到】igel 基于 scikit-learn 构建，支持 sklearn 的所有机器学习功能，如回归、分类和聚类。用户无需编写一行代码即可使用机器学习模型，只要有 yaml 或 json 文件，来描述你想做什么即可。
其基本思路是在人类可读的 yaml 或 json 文件中将所有配置进行分组，包括模型定义、数据预处理方法等，然后让 igel 自动化执行一切操作。用户在 yaml 或 json 文件中描述自己的需求，之后 igel 使用用户的配置构建模型，进行训练，并给出结果和元数据。
igel 目前支持的所有配置如下所示：
# dataset operationsdataset:type: csv# [str] -> type of your datasetread_data_options: # options you want to supply for reading your data (See the detailed overview about this in the next section)sep:# [str] -> Delimiter to use.delimiter:# [str] -> Alias for sep.header:# [int, list of int] -> Row number(s) to use as the column names, and the start of the data.names:# [list] -> List of column names to useindex_col: # [int, str, list of int, list of str, False] -> Column(s) to use as the row labels of the DataFrame,usecols:# [list, callable] -> Return a subset of the columnssqueeze:# [bool] -> If the parsed data only contains one column then return a Series.prefix:# [str] -> Prefix to add to column numbers when no header, e.g. ‘X’ for X0, X1, …mangle_dupe_cols:# [bool] -> Duplicate columns will be specified as ‘X’, ‘X.1’, …’X.N’, rather than ‘X’…’X’. Passing in False will cause data to be overwritten if there are duplicate names in the columns.dtype:# [Type name, dict maping column name to type] -> Data type for data or columnsengine:# [str] -> Parser engine to use. The C engine is faster while the python engine is currently more feature-complete.converters: # [dict] -> Dict of functions for converting values in certain columns. Keys can either be integers or column labels.true_values: # [list] -> Values to consider as True.false_values: # [list] -> Values to consider as False.skipinitialspace: # [bool] -> Skip spaces after delimiter.skiprows: # [list-like] -> Line numbers to skip (0-indexed) or number of lines to skip (int) at the start of the file.skipfooter: # [int] -> Number of lines at bottom of file to skipnrows: # [int] -> Number of rows of file to read. Useful for reading pieces of large files.na_values: # [scalar, str, list, dict] ->Additional strings to recognize as NA/NaN.keep_default_na: # [bool] ->Whether or not to include the default NaN values when parsing the data.na_filter: # [bool] -> Detect missing value markers (empty strings and the value of na_values). In data without any NAs, passing na_filter=False can improve the performance of reading a large file.verbose: # [bool] -> Indicate number of NA values placed in non-numeric columns.skip_blank_lines: # [bool] -> If True, skip over blank lines rather than interpreting as NaN values.parse_dates: # [bool, list of int, list of str, list of lists, dict] ->try parsing the datesinfer_datetime_format: # [bool] -> If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them.keep_date_col: # [bool] -> If True and parse_dates specifies combining multiple columns then keep the original columns.dayfirst: # [bool] -> DD/MM format dates, international and European format.cache_dates: # [bool] -> If True, use a cache of unique, converted dates to apply the datetime conversion.thousands: # [str] -> the thousands operatordecimal: # [str] -> Character to recognize as decimal point (e.g. use ‘,’ for European data).lineterminator: # [str] -> Character to break file into lines.escapechar: # [str] ->One-character string used to escape other characters.comment: # [str] -> Indicates remainder of line should not be parsed. If found at the beginning of a line, the line will be ignored altogether. This parameter must be a single character.encoding: # [str] -> Encoding to use for UTF when reading/writing (ex. ‘utf-8’).dialect: # [str, csv.Dialect] -> If provided, this parameter will override values (default or not) for the following parameters: delimiter, doublequote, escapechar, skipinitialspace, quotechar, and quotingdelim_whitespace: # [bool] -> Specifies whether or not whitespace (e.g. ' ' or '') will be used as the seplow_memory: # [bool] -> Internally process the file in chunks, resulting in lower memory use while parsing, but possibly mixed type inference.memory_map: # [bool] -> If a filepath is provided for filepath_or_buffer, map the file object directly onto memory and access the data directly from there. Using this option can improve performance because there is no longer any I/O overhead.

机器之心|一行代码不用写，就可以训练、测试、使用模型，这个项目帮你做到

推荐阅读

深圳■深圳抗疫十问① | 深圳为什么那么早就发现新冠肺炎“人传人”？

没有办法面对曾经的自己咋办

「医学观察」江夏新增300张发热床位，2月3日启用

u19|长春女足队员入选U19大名单

八卦君说事内置北斗导航，外形酷似悍马，东风首款HUV将上市

搞笑街区|客人点鱼的时候，小二总会把活鱼拿上来！，开心笑话：有一个饭店

王者荣耀|E星3-1TTG：纵情疯狂军训绑兔，老林真有点东西，TTG被大乔打哭

汽车贴膜要多久(汽车贴膜哪个品牌好)

新华网|德国法兰克福“跳蚤市场”重新开放[组图]

DNF心情咖啡屋|DNF：旭旭宝宝再创奇迹！首把+20神之意象诞生，账号价值已翻倍

奥沙利文|世锦赛第6冠在招手！奥沙利文8-2连丢4局，仍10-7占先机

清秀凉鞋贵吗

美的薄盐电压力锅测评

河北疫情?HEB文件要怎麽打開?

【小儿】小儿发烧物理降温 7种降温方式有效退烧

分居期间一方谈恋爱算出轨吗离婚分居期间可以找男朋友吗

达米安·利拉德|2K魔咒再现？利拉德或离开，那些年这六位球员成封面人物后离队

80年代的绿皮火车和赶火车人：原来那时候乘火车是这个样子

体坛克卜勒|为何蓝黑球迷隐隐担忧，一诡异魔咒太可怕，国米5球大胜挺进决赛

健身|坚持练腿有多重要？健身多练腿，你会有什么益处？