Weiwei's profileData, Knowledge & LifePhotosBlogLists Tools Help

Data, Knowledge & Life

everyday experience on science, internet, and hobbies

Hot Music

 

All the views you read are from me, not my employer.

转载敬请注明出处!

Creative Commons License

What's New

http://friendfeed.com/chengweiwei

什么是(读)博士?

  什么是博士,读博士到底是什么意思?这些大约是每一个未来的博士生都很想了解的问题。大家都知道博士是当代教育体系里最高的学位头衔;除此之外,很多人也就说不出什么了。犹他大学的Matt Might做了一套非常贴切的图示。我在这里看图说话,借花献佛。


不妨把所有的人类知识想象成一个圆。


等到小学毕业的时候,我们掌握了一些基本的常识。


中学毕业,我们的知识得到了明显的扩展。


当你完成本科的学习之后(学士),你已经具备某些特长了。


在硕士阶段,我们通常会朝着与学士阶段相同的方向进一步强化这些特长。


在准备(硕士阶段末期)或开始读博士的时候,通过阅读科研论文,你应该接触到了某一领域人类知识的最前沿。


一旦你的认知达到了那个级别,你就会专注于:


在接下来好几年的时间里,你会不断尝试打破现有知识体系的框架,寻求突破。


直到有一天,你打破了边界。


别小看这么个微小的凸点:因为它,你赢得了你的博士学位。


显然,这一刻,世界在你眼里会霎时变得不同。


不过,千万不要只看见树木,忽略了森林。


Keep pushing!


附我在德国热线一个帖子里的回复:

  隔行如隔山。不同领域,相互之间的认识有偏差是很正常的事情。我之前就有被生物学科的博士生问到计算机学科的博士研究生都干些什么,何必要去读博士。

  计算机博士生虽然也会花大量的时间从事一些非常实践的工作,比如编程,但是其目的完全与本科硕士阶段不同:本硕阶段工作的任务主要是学会如何在现有的知识框架下实现既定的功能;而博士阶段的工作,包括编程在内,是致力于进一步从理论角度(为主)和实践角度(为辅)不断推进现有算法与系统的性能。用一句经常可以听到的话说,就是"Shifting the boundary"。程序测试往往只是用来检验新理论的一种方式。由于现代计算机学科的分类非常的复杂、庞大,计算机博士生的工作特性很难一概而论。随便走访一下全球顶尖大学的研究生院就不难发现,几乎所有这些大学都有计算机等相关领域的博士教育。在Google、Microsoft、Yahoo等很多信息产业巨擎的科研机构,对于很多相关职位,尤其是Research Scientist的职位,计算机博士学位几乎是必备的条件之一。

  诚然,从一些实用的角度来讲,很多理工类学科,包括计算机学科在内,本科、硕士毕业之后就已经具备了相当的开发能力,满足了很多实际工作的需要,有很好的就业前景。如果本人对科研完全没有兴趣,当然没有必要去读博士。这一点对所有的学科恐怕都是一样的。

  我一直觉得计算机科学(Computer Science)这个名字很不好,不能适当的涵盖这个学科的实际意义。这一点上,德语(Informatik)就要贴切得多。如果光从名字上看,还会以为计算机科学就是研究计算机的。其实,计算机与计算机科学的关系,大约也就是望远镜与天文学的关系。

  祝楼主好运!

尴尬的面试时刻

  ResumeBear的一篇文章Embarrassing Interview Moments and How to Avoid Them谈到了很多面试时候出现的尴尬场景。本着有则改之无则加勉的精神,现把他们拿来提示、娱乐一下自己。我只是做了翻译的工作,读者请不要对号入座。

  • 一名应征者叫自己的姐姐来代替自己面试。
  • 一名应征者在面试的过程中跳起舞来,还不停的叫道:“哦耶,我爱生活!”
  • 一名应征者的肩膀上站着一只鹦鹉。
  • 一名应征者在面试的中途打断面试官,问能不能给他支烟。
  • 应征者中的一位男性在结束面试走出办公室的时候撞在办公室的玻璃门上。玻璃门应声而碎。
  • 一名应征者弄错了自己正在面试的公司。在面试的时候,他不断夸着另一家公司。那家公司是这家公司的死对头。
  • 一名应征者在整个面试过程中从没有一次读对面试官的名字。
  • 一家零售公司的面试。一位应征者被问及为什么想要为这家公司工作,她说:“因为我再也不想在零售业工作了。”
  • 一名应征者用手抓嘴里的口香糖 —— 这个还好 —— 然后想用这只手和面试官握手。
  • 一名应征者大约是意识到自己说话的时候喜欢情不自禁的打手势,于是他在整个面试过程中坐在自己的手上。
  • 一名应征者在面试的时候睡着。

  • 一名应征者穿着睡衣出现在自己的面试,头发蓬乱,好像刚从被子里钻出来。
  • 一名应征者没有意识到自己裤子后面破了一个大洞。

  • 一名应征者的面试过程非常顺利。在面试的最后,面试官问她为什么要离开上一个工作,她说因为那里的每一个人都想要找她麻烦。
  • 一名应征者嘲笑自己的面试官领带打得太难看。

Erfolgreich fern der Heimat

Weiwei Cheng bei der PreisverleihungDoktorand des Marburger Fachbereichs Mathematik und Informatik erhält chinesische Auszeichnung

Weiwei Cheng, Doktorand am Marburger Fachbereich Mathematik und Informatik, hat den "Chinese Government Award for Outstanding Self-Financed Students Abroad" erhalten.

Chengs Forschungsschwerpunkt liegt im so genannten Präferenzlernen: Er entwickelt beispielsweise auf der Basis von gegebenen Daten und Beobachtungen über die bevorzugt besuchten Internetseiten einer Person Ranking-Modelle, mit deren Hilfe man dem Nutzer Alternativen zur Verfügung stellen kann. In seiner Ansprache, die Cheng bei der Preisverleihung in der chinesischen Botschaft in Berlin auch stellvertretend für die 36 weiteren Geehrten gab, dankte er seinem Marburger Mentor Professor Dr. Eyke Hüllermeier dafür, dass er "mich an das Forschungsfeld Maschinelles Lernen herangeführt hat".

Hüllermeier lobt Fleiß und Ehrgeiz seines Doktoranden: "Er hat alles, was ein guter Wissenschaftler braucht, und kann seine Arbeiten sehr gut kommunizieren."

Der mit 5000 Dollar dotierte Preis wird jährlich vom "China Scholarship Council" an chinesische Nachwuchswissenschaftler vergeben, die während ihres Graduiertenstudiums im Ausland überdurchschnittliche Leistungen erzielen und sich selbst finanzieren, also nicht von staatlicher Förderung abhängig sind.

Quelle: Pressestelle der Philipps-Universität Marburg
Update: Ein Artikel aus Oberhessische Presse

The Most Important Algorithms (in CS and Math)


  我接触的同僚之中,大约每个人心里都有自己最爱的几种算法。下面是Christoph Koutschan列出来的32类计算机与数学领域最为重要的算法(按字符顺序排列)。覆盖的面很广,评价很精准。
  1. A* search algorithm
    Graph search algorithm that finds a path from a given initial node to a given goal node. It employs a heuristic estimate that ranks each node by an estimate of the best route that goes through that node. It visits the nodes in order of this heuristic estimate. The A* algorithm is therefore an example of best-first search.
  2. Beam Search
    Beam search is a search algorithm that is an optimization of best-first search. Like best-first search, it uses a heuristic function to evaluate the promise of each node it examines. Beam search, however, only unfolds the first m most promising nodes at each depth, where m is a fixed number, the beam width.
  3. Binary search
    Technique for finding a particular value in a linear array, by ruling out half of the data at each step.
  4. Branch and bound
    A general algorithmic method for finding optimal solutions of various optimization problems, especially in discrete and combinatorial optimization.
  5. Buchberger's algorithm
    In computational algebraic geometry and computational commutative algebra, Buchberger's algorithm is a method of transforming a given set of generators for a polynomial ideal into a Gröbner basis with respect to some monomial order. One can view it as a generalization of the Euclidean algorithm for univariate gcd computation and of Gaussian elimination for linear systems.
  6. Data compression
    Data compression or source coding is the process of encoding information using fewer bits (or other information-bearing units) than an unencoded representation would use through use of specific encoding schemes.
  7. Diffie-Hellman key exchange
    Cryptographic protocol which allows two parties that have no prior knowledge of each other to jointly establish a shared secret key over an insecure communications channel. This key can then be used to encrypt subsequent communications using a symmetric key cipher.
  8. Dijkstra's algorithm
    Algorithm that solves the single-source shortest path problem for a directed graph with nonnegative edge weights.
  9. Discrete differentiation
    I.e., the formula f'(x) = (f(x+h) - f(x-h)) / 2h.
  10. Dynamic programming
    Dynamic programming is a method for reducing the runtime of algorithms exhibiting the properties of overlapping subproblems and optimal substructure, described below.
  11. Euclidean algorithm
    Algorithm to determine the greatest common divisor (gcd) of two integers. It is one of the oldest algorithms known, since it appeared in Euclid's Elements around 300 BC. The algorithm does not require factoring the two integers.
  12. Expectation-maximization algorithm (EM-Training)
    In statistical computing, an expectation-maximization (EM) algorithm is an algorithm for finding maximum likelihood estimates of parameters in probabilistic models, where the model depends on unobserved latent variables. EM alternates between performing an expectation step, which computes the expected value of the latent variables, and a maximization step, which computes the maximum likelihood estimates of the parameters given the data and setting the latent variables to their expectation.
  13. Fast Fourier transform (FFT)
    Efficient algorithm to compute the discrete Fourier transform (DFT) and its inverse. FFTs are of great importance to a wide variety of applications, from digital signal processing to solving partial differential equations to algorithms for quickly multiplying large integers.
  14. Gradient descent
    Gradient descent is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient (or the approximate gradient) of the function at the current point. If instead one takes steps proportional to the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.
  15. Hashing
    A function for summarizing or probabilistically identifying data. Typically this means one applies a mathematical formula to the data, producing a string which is probably more or less unique to that data. The string is much shorter than the original data, but can be used to uniquely identify it.
  16. Heaps (heap sort)
    In computer science a heap is a specialized tree-based data structure. Heaps are favourite data structures for many applications: Heap sort, selection algorithms (finding the min, max or both of them, median or even any kth element in sublinear time), graph algorithms.
  17. Karatsuba multiplication
    For systems that need to multiply numbers in the range of several thousand digits, such as computer algebra systems and bignum libraries, long multiplication is too slow. These systems employ Karatsuba multiplication, which was discovered in 1962.
  18. LLL algorithm
    The Lenstra-Lenstra-Lovasz lattice reduction (LLL) algorithm is an algorithm which, given a lattice basis as input, outputs a basis with short, nearly orthogonal vectors. The LLL algorithm has found numerous applications in cryptanalysis of public-key encryption schemes: knapsack cryptosystems, RSA with particular settings, and so forth.
  19. Maximum flow
    The maximum flow problem is finding a legal flow through a flow network that is maximal. Sometimes it is defined as finding the value of such a flow. The maximum flow problem can be seen as special case of more complex network flow problems. The maximal flow is related to the cuts in a network by the Max-flow min-cut theorem. The Ford-Fulkerson algorithm computes the maximum flow in a flow network.
  20. Merge sort
    A sorting algorithm for rearranging lists (or any other data structure that can only be accessed sequentially, e.g. file streams) into a specified order.
  21. Newton's method
    Efficient algorithm for finding approximations to the zeros (or roots) of a real-valued function. Newton's method is also a well-known algorithm for finding roots of equations in one or more dimensions. It can also be used to find local maxima and local minima of functions.
  22. Q-learning
    Q-learning is a reinforcement learning technique that works by learning an action-value function that gives the expected utility of taking a given action in a given state and following a fixed policy thereafter. A strength with Q-learning is that it is able to compare the expected utility of the available actions without requiring a model of the environment.
  23. Quadratic sieve
    The quadratic sieve algorithm (QS) is a modern integer factorization algorithm and, in practice, the second fastest method known (after the number field sieve, NFS). It is still the fastest for integers under 110 decimal digits or so, and is considerably simpler than the number field sieve.
  24. RANSAC
    RANSAC is an abbreviation for "RANdom SAmple Consensus". It is an algorithm to estimate parameters of a mathematical model from a set of observed data which contains "outliers". A basic assumption is that the data consists of "inliers", i. e., data points which can be explained by some set of model parameters, and "outliers" which are data points that do not fit the model.
  25. RSA
    Algorithm for public-key encryption. It was the first algorithm known to be suitable for signing as well as encryption. RSA is still widely used in electronic commerce protocols, and is believed to be secure given sufficiently long keys.
  26. Schönhage-Strassen algorithm
    In mathematics, the Schönhage-Strassen algorithm is an asymptotically fast method for multiplication of large integer numbers. The run-time is O(N log(N) log(log(N))). The algorithm uses Fast Fourier Transforms in rings.
  27. Simplex algorithm
    In mathematical optimization theory, the simplex algorithm a popular technique for numerical solution of the linear programming problem. A linear programming problem consists of a collection of linear inequalities on a number of real variables and a fixed linear functional which is to be maximized (or minimized).
  28. Singular value decomposition (SVD)
    In linear algebra, SVD is an important factorization of a rectangular real or complex matrix, with several applications in signal processing and statistics, e.g., computing the pseudoinverse of a matrix (to solve the least squares problem), solving overdetermined linear systems, matrix approximation, numerical weather prediction.
  29. Solving a system of linear equations
    Systems of linear equations belong to the oldest problems in mathematics and they have many applications, such as in digital signal processing, estimation, forecasting and generally in linear programming and in the approximation of non-linear problems in numerical analysis. An efficient way to solve systems of linear equations is given by the Gauss-Jordan elimination or by the Cholesky decomposition.
  30. Strukturtensor
    In pattern recognition: Computes a measure for every pixel which tells you if this pixel is located in a homogenous region, if it belongs to an edge, or if it is a vertex.
  31. Union-find
    Given a set of elements, it is often useful to partition them into a number of separate, nonoverlapping groups. A disjoint-set data structure is a data structure that keeps track of such a partitioning. A union-find algorithm is an algorithm that performs two useful operations on such a data structure:
    Find: Determine which group a particular element is in.
    Union: Combine or merge two groups into a single group.
  32. Viterbi algorithm
    Dynamic programming algorithm for finding the most likely sequence of hidden states - known as the Viterbi path - that result in a sequence of observed events, especially in the context of hidden Markov models.
  Daniel Lemire在看到这个名单之后,列出了他心中的Top 5:
  • Binary search is the first non-trivial algorithm I remember learning.
  • The Fast Fourier transform (FFT) is an amazing algorithm. Combined with the convolution theorem, it lets you do magic.
  • While hashing is not an algorithm, it is one of the most powerful and useful idea in Computer Science. It takes minutes to explain it, but years to master.
  • Merge sort is the most elegant sorting algorithm. You can explain it in three sentences to anyone.
  • While not an algorithm per se, the Singular Value Decomposition (SVD) is the most important Linear Algebra concept I don’t remember learning as an undergraduate. (And yes, I went to a good school. And yes, I was an A student.) It can help you invert singular matrices and do other similar magic.

The Expendables



I will definitely go to watch this movie of action legends.
check the trailer

Stay Hungry. Stay Foolish.



  “Stay Hungry. Stay Foolish.”(保持饥饿,保持愚蠢)是一句励志格言,出自一九七四年版的The Whole Earth Catalog。The Whole Earth Catalog是一份类似于邮购杂志的刊物,上面刊登着各色各类的商品及其价格;不同于一般的邮购杂志,The Whole Earth Catalog立志于挖掘和展现立意新颖、形式嬉皮的产品。在六七十年代,她在美国青年人当中有着相当的影响力,被称作那个时代的圣物之一。现在,经常有人把她叫作“40年前纸质的Google”。“Stay Hungry. Stay Foolish.”就是写在一九七四年版The Whole Earth Catalog封底的临别赠言。二零零五年,苹果CEO斯蒂文·乔布斯在斯坦福大学毕业典礼上发表演讲。在那段传奇演讲的最后,乔布斯引用了这句话。

  自我感觉良好是人类心理的一种自然缺陷。心理学研究发现

96%的癌症病人,认为自己比其他癌症病人健康;
93%的司机,认为自己的安全意识高于普通司机;
90%的学生,认为自己的智力在平均水平之上;
94%的教授,认为自己的教学水平高于平均水平;
92%的被访问者,认为自己比一般人更公正……

  这种对自己的高估是普遍存在的。心理学有一个专门的名词,叫做“虚幻的优越性”(illusory superiority)。

  固有的缺陷,我们无法改正;我们所能做的,就是时刻提醒自己更多地倾听,更多地提问。

  “Stay Hungry. Stay Foolish.”

  我把这句话送给自己,送给大家。

The key word is learning, of course! Again.




It is the word cloud extracted from the titles of accepted papers at ECMLPKDD 2010. Click the image for an enlarged picture. It is also available at Wordle.net. This year ECMLPKDD has accepted 120 papers with an acceptance rate of 18%.

You can check a similar word cloud extracted from ICML 2010 at my previous post. That makes a good comparison.

Top 10 Stop Motion Videos on YouTube

  

The video above "DEADLINE post-it stop motion" is one of them. It has already drew 3,858,368 views by now. The other videos are listed at Mashable.

二零零九国家优秀自费留学生奖学金颁奖仪式


  本年度国家优秀自费留学生奖学金的颁奖仪式于六月五日在柏林驻德大使馆举行。吴红波大使、奖学金获得者、指导教授代表、柏林地区大学外办代表、使馆教育处相关负责人、驻法兰克福总领馆和驻慕尼黑总领馆教育领事等共70余人参加了颁奖仪式。使馆教育处姜锋公参主持了仪式。吴红波大使向获奖者颁奖并致辞。颁奖仪式的具体日程:

11:00 开始进场 (小提琴三重奏,德沃夏克作品)
11:35 姜锋公参主持,仪式开始
11:40 吴红波大使讲话
12:10 指导教授代表发言
Prof. Dr. -Ing Joerg Wallaschek,汉诺威大学
12:20 获奖者代表发言
程蔚蔚,马尔堡大学
12:30 吴红波大使为二十七位获奖者颁发获奖证书
12:45 吴红波大使向指导教授等六位外方代表赠送纪念品,并合影留念
12:50 获奖者与吴大使、使馆工作人员合影留念
姜锋公参致谢来自柏林艺术大学的演奏者并赠送鲜花
13:00 招待午宴
14:00 颁奖活动结束

  以下是我在颁奖仪式上的发言:

尊敬的大使先生、各位来宾、亲爱的各位获奖者、女士们先生们:

你们好。我是来自马尔堡大学数学与计算机系的程蔚蔚。

今天,在这里接受二零零九国家优秀自费留学生奖学金,我的心中满怀感激之情。我谨代表本次获奖的三十七位留德同学对祖国人民的殷切关怀表示最由衷的谢意!非常感谢吴大使的关心和鼓励,能够从您手中接过这份荣誉是我们莫大的荣幸。感谢教育部、国家留学基金委为广大留学生设立了这份奖学金。这是对我们学习科研的一种肯定,更是对我们的一种激励和鼓舞,时刻提醒着我们不敢有丝毫懈怠。感谢驻德使馆教育处以及国内评审委员会的专家对我们的信任与支持。在未来的日子里,我们会牢记这份信任,不断鞭策自己向前。

我要感谢我的博士生导师,德国马尔堡大学数学与计算机系的Eyke Hüllermer教授。感谢他这些年来在机器学习领域对我的无私指导。能够获得今天的这份荣誉,与他平日的教诲是分不开的。同时,我要感谢马尔堡大学知识工程与生物计算机实验室的同事们,感谢他们长久以来的合作与帮助。当然还有我在国内的父母,这次获奖,我的父母比我还要激动。今天的这份荣誉同样属于他们。

转眼之间,我来德国已经六年了。德国是一片神奇的土地,孕育了高斯、爱因斯坦这样永留史册的科学巨擎。多少年来,广大留德中国学生浸淫在这片求是、严谨的学术氛围里。他们之中,走出了像季羡林、裘法祖这样为祖国科技发展做出杰出贡献的知名学者。能够追随前辈们的脚步,在德国学习工作,我深感荣幸与骄傲。作为一名计算机科学专业的博士研究生,我日常的工作集中于分析和理解我们周围的数据,并以此来改善人们的生活。为此,我从事的研究主要集中在机器学习这一领域。机器学习是人工智能的一个分支。她的主要任务,简单来说,就是开发可以积累经验,自我学习的计算机程序。我们设计能够可以通过分析数据从而获取知识的计算机算法。经过五十多年的研究,机器学习已经发展出了一套相对完整的学科体系,提出了一系列意义重大的统计计算理论。机器学习算法已经广泛应用于诸如语音识别、计算机图形、基因分析等大量高精尖领域。并且,机器学习已经成为互联网时代信息技术飞跃的原动力和核心技术之一。能够有机会见证和参与到这样伟大的技术变革之中来,我觉得自己非常的幸运。

虽然每门学科都有自己的研究重点,但是毋庸置疑,研究开发有价值的技术成果要求我们有着广阔的视野,积极地与其他领域的学者,包括自然科学、社会科学、工程学、医药学等很多领域的学者,进行交流和沟通。我相信,从某种角度来说,国家优秀自费留学生奖学金项目为我们提供了这样的便利条件:她把很多来自祖国的优秀学者聚集到了一起。我们应当善用这个机会,增进彼此之间的交流,把手中的奖学金转化为科研成果的催化剂。

亲爱的同学们,今天颁发的每一份奖学金的背后都是大家的辛勤与汗水,她是我们用自己的勤奋努力和踏实付出所换来的。这份奖学金是我们人生轨迹当中一道绚丽的风景:诚然精彩,但却不是我们最终的目标。我们应当正确看待奖励和荣誉。认真学习,不断努力,绝不轻言放弃,而又看淡得失。我们处在一个挑战与机遇共存的伟大时刻,于科学、于祖国、于世界,大家的肩上都负担着不可推卸的重任。几年的求学生涯转瞬即逝,我们要如何前行?只有踏实勤奋,拥有一颗平常心,才能取得更优异的成绩,做到无愧于心。

心存感恩,励精图治;祖国虽在万里之外,却永远在我们心中。此时此刻,洋溢在我们心里的,并非是到达终点时的欢愉,而是扬帆起航时的豪情。在未来的人生旅途中,我会铭记今天的这份感受,锐意进取,奋斗不息。路漫漫其修远兮,吾将上下而求索。

谢谢大家。



  PS:

 

Weiwei Cheng

Interests
reading Mandarin debate PC game table tennis
Machine learning researcher, consultant, PhD candidate at University of Marburg

Search

Loading...